12. List-based Set implementations

We will look at five different implementations of a Set, based on a List → using different locking patterns.

Granularity types:

coarse-grained locking
- one global lock
fine-grained locking
- one lock per item
optimistic synchronization
- assume rare conflicts (use CAS to check for conflict, then retry)
lazy synchronization
- delay or simplify cleanup work
- mark things as “deleted”
Lock-free synchronization
- use atomic hardware primitives (like CAS) exclusively to manage concurrent access without any mutex locks.
- immune to deadlock and better performance under certain conditions

The List-based Set provides two operations:

12.1 Coarse-Grained Locking

Coarse-grained locking would just synchronize add, remove, contains.

We have to traverse the list in $O (n)$ for any op, thus very inefficient with a global lock!

12.2 Fine-Grained Locking

Fine-grained aims to split the object (list) into pieces (nodes) with separate locks, allowing concurrent operations on disjoint parts.
→ modifying element at start and removing one at the end doesn’t interfere so should be possible concurrently…

12.2.1 Naive Implementation

This collapses when we have conflicting operations:

12.2.2 Hand-over-Hand Locking (Lock Coupling)

Like Mogli/Tarzan in the jungle.

The Problem: Modifying the list structure requires coordinated, exclusive access to multiple adjacent nodes simultaneously.

Removing c requires locking both b (to change b.next) and c (to read c.next).

Fix? lock coupling:

To traverse, always hold the lock on the current node you are examining.
To move from curr to next:
1. Lock next (curr.next.lock()).
2. Unlock curr (curr.unlock()).
To perform an operation (like delete) involving pred and curr:
1. Traverse until pred and curr are the nodes you need.
2. You will arrive holding the lock on pred.
3. Lock curr.
4. Now you hold locks on both pred and curr. Perform the modification (e.g., pred.next = curr.next).
5. Unlock curr.
6. Unlock pred.

Why no deadlock?

All locks are acquired in the same order → traversal order of the list.

Cons of this:

lots of acquire and release → overhead
threads accessing disjoint parts can still block each other!!
- they need to “pass by each other”, which is not possible
- first thread sets tempo for all the ones after
- kein überholen basically

12.3 Optimistic Locking

Idea: find nodes without locking (i.e. find the pointers)

check that everything is ok

Perform the operation in stages:

Find Nodes (Lock-Free): Traverse the list without acquiring any locks to find the relevant nodes
Lock Nodes: Once the target nodes are found, acquire locks only on those specific nodes.
Validate: After acquiring locks, check if the situation is still valid.
- Did another thread modify the list structure (e.g., delete pred or curr, insert between them) after stage 1 but before we acquired the locks in stage 2?
Perform Operation: If validation succeeds, perform the actual modification (e.g., update pointers).
Unlock: Release the locks. If validation fails, release locks and retry (or signal failure

Validation Summary

The validate(pred, curr) method needs to lock pred and curr:

check if pred is still reachable from head

check if curr is still the immediate successor of pred (pred.next == curr)
Implicitly check if pred/curr not marked deleted (relevant for lazy lists)

Proof:

remove(c) if validation passes while holding locks on b and c:
- no other thread can be deleting b or c nor inserting between them
remove(c) where c not found → we can safely return false if validation passes while holding both b and d
- no other thread could have inserted between them.

12.3.1 Example `add(c)`

Traverse without locks, finding that c should go between b and d.
Lock b and d.
Validate: Is b still reachable? Is d still the successor of b?
If valid, set c.next = d, b.next = c.
Unlock b and d.

Failure 1: (Deletion)

the prev. node to the one we want to delete could be “detached from the list”
then our deletion changes nothing relevant → node outside of list is not traversed anymore
out operation was made moot

Validate: have to check pred still reachable

Failure 2: (Insertion)

another thread could have arrived and inserted a new node in between the ones we wanted to insert into
then we validate → fails
we have to redo everything (rescan)

Validate: have to check pred.next == succ

Trade-Offs:
Pro

no contention on traversals
- wait-free traversals even → completes in a finite number of its own steps regardless of other threads speeds or pausing
fewer lock acquisitions
Con
double traversal (scan, lock, validate-scan, modify)
contains() needs locks → if we don’t validate, we might return true for a node currently being deleted
not starvation-free → threads might repeatedly fail validation due to high contention

Note:

if we do it in the other order:

another thread traversing here would find a dead-end in entry.

12.4 Lazy Synchronization

The Lazy List Approach is similar to optimistic list but:

scan only once
contains() never locks A simple scan is sufficient.

How? Removing nodes causes trouble →

uses deletion-markers → remove first marks the node as “logically deleted”
Physical Deletion (Lazy) is done later
- potentially by the same remove operation or by subsequent traversals

Key Invariant: Every unmarked node in the list must always be reachable from the head. contains only returns true for unmarked nodes.

Implementation:

→ first logically remove → then physically remove!

Validation is now slightly modified:

check if pred is reachable (by checking if it’s marked) and points to curr
check that curr is not already marked as deleted.

Wait-free contains implementation:

contains is wait-free:

does not lock
does not have a “retry-loop”
no “stranding” → only logical deletion: the next pointer still points forward, thus we can always advance in the list
for the marked bit, we can use a volatile variable for mutex
if we didn’t race condition as we don’t acquire the lock on that node
we do not have to use AtomicMarkableReference → just need to have latest value as guaranteed by volatile
- in a lock-free list, we do need an AtomicMarkableReference, because removes aren’t serialised by locks.
  - → race

12.5 Skip Lists

Collection of elements (without duplicates), same add/remove/find algorithms.
→ many calls to find and fewer to add and much fewer to remove

We use a las-vegas style randomised algorithm.

will always find the element (or insert/remove)
but the runtime is random

It’s a sorted multi-level list.

the node “height” is probabilistic
- $Pr [height = n] = 0. 5^{n}$
- no rebalancing.
  the list forms a kind of “binary-tree” probabilistically. Because it’s $0. 5^{n}$ , we have on average this tree structure by the law of large numbers.

12.5.2 Sequential find

The “higher” you are, the “faster” you can traverse the list.

in practice, this looks like the following:

a node has multiple successor nodes → we check all of them. We proceed if $<$ than the target number.

thanks to the list being sorted we can literally search like a “binary tree” left/right

12.5.2 Add

First, we determine the height of the current node → randomly choose a value uniformly.

Find predecessors?

normal search - start at height of the element to be inserted
for each level:
- remember the last node you read before dropping down
this gives for insertion of $x_{i}$ :
- all nodes $< x_{i}$
- with a pointer to an element $v_{i} > x_{i}$
  - these are the pointers we update

Niklas @ ETHZ

Explorer

12. List-based Set implementations

12.1 Coarse-Grained Locking