13. Lock-Free and Wait-Free Datastructures

Recap We saw spinlocks vs. scheduled locks (wait, await)
Now we’ll do lock-free programming/datastructures

Spinlock problems:

scheduling fairness → no FIFO behaviour
computing resources wasted, performance degradation
no notification mechanism

Locks with waiting problems: (semaphores, mutexes, monitors often implemented with such a lock)

require OS support
datastructures need to be protected against concurrent access
- we do this using spinlocks
higher wakeup latency → scheduler involved
if thread is delayed when in CS → all threads suffer
what if thread dies in critical section
- prone to deadlocks
locks cannot be used in interrupt handlers… (who handles the interrupt, in the interrupt handler)

13.1 Lock-Free Programming

Lock-freedom: (= deadlock free) at least one thread always makes progress even if other threads run concurrently

implies system-wide progress but not freedom from starvation
Wait-freedom: (=starvation-free) all threads eventually make progress
implies freedom from starvation

Wait-freedom (stronger!) $⟹$ lock-freedom

We can see the mapping between the concepts names in the two programming paradigm:

Non-blocking algorithms: locks/blocking → a thread can infinitely delay another thread

non-blocking = failure/suspension of one thread cannot cause failure or suspension of another thread.

Note, in Java the following holds:

ReentrantLock

We can assume the ReentrantLock is deadlock-free.
If we initialise ReentrantLock(true), then it’s also starvation-free, since the fairness-parameter is set to true.

To program lock-free we use atomic operations.
CAS: can be implemented lock-free in hardware

usually even wait-free (but depends on the java compiler for example)

13.1.2 Lock-Free Counter

Use CAS and redo until the counter was incremented by us.

public class CasCounter {
    private AtomicInteger value;
 
    public int getVal() {
        return value.get();
    }
 
    public int inc() {
        int v;
        do {
            v = value.get();
        } while (!value.compareAndSet(v, v + 1));
        return value;
    }
}

(usually in java we could just use .increment …)

This is a lock-free but not wait-free implementation → it doesn’t need to terminate in finite steps.

that is because of the while loop
can be infinitely looping in worst-case

Positive result of CAS suggests that no other thread has written between .get and compareAndSet

we read a
other thread could have written b in between
other thread modifies it again to a
we compareAndSet → returns
ABA problem

13.1.3 DCAS

Double compare and swap

It allows performing a CAS on two memory locations at the same time.

Primitive (CPU operation) that can be used to implement software transactional memory (STM).

13.2 Lock-Free Stack

For a blocking stack, we can just use synchronized (global lock).

Non-blocking Stack implementation:

we use an atomic reference to the top element.

To pop/push, we:

save current top element as a local variable
for add: set our new elements .next
try to CAS the new element into the top
if it fails → retry

Pros:

Since this is lock-free → deadlock-free by design.
slightly more performant in a real-world test

Fix we can use backoff to make the atomic operations faster → prevent contention.

13.3 Lock-Free List Set

(same thing as before, but we try to make it lock free again)

We use CAS to switch the pointers:

works → CAS decides who “wins” and completes the operation

Problems: Same issue as before

we fixed this before by using mark-bits. Let’s see if we can fix it the same way:

since B can be interrupted now (with locks we couldn’t have accessed c or d)
→ A can complete before B can → corrupts the state

This is a fundamental problem → we can’t “atomically” update / check both atomic variables.
→ want to atomically establish consistency of two things

13.3.1 `AtomicMarkableReference`

The solution to our problem from before (update / do two things “at once”) is an AtomicMarkableReference.

This works since we can just use a “free” bit in the address pointer.

In a 64-bit system, the addressable memory space is so large that removing a few bits doesn’t change anything (nobody has 9 trillion petabytes…).

Java provides AtomicMarkableReference<V> to address this. It atomically manages a reference V and a single boolean mark.

compareAndSet(expectedRef, newRef, expectedMark, newMark): Atomically sets the reference to newRef and the mark to newMark if and only if the current reference is expectedRef AND the current mark is expectedMark.
attemptMark(expectedRef, newMark): Atomically sets the mark to newMark if the current reference is expectedRef.
Other methods: get(), getReference(), isMarked(), set().

This effectively provides a way to perform a Double CAS (DCAS) on the reference and the mark bit together.

13.3.2 Lock-free List-Set Stack with `AtomicMarkableReference`

remove:

logical delete try to set mark on c (set mark on c.next as that’s where we store the mark)
- this could fail if someone inserted after c for example
- or concurrent remove
  → retraverse and retry
physical delete: we atomically set b.next=(c, false) (→ that means next is c and it’s not marked) to (d, false) via CAS

This prevents the previous failure mode!

remove(c) fails because it’s marked itself now → still logically deleted so it’s fine.

Clean-up the clean-up is done when you find a marked node in your path → CAS the previous node’s .next to the one after.
→ any thread can do this!
the “helping” is a classic pattern in lock-free programming to help all threads make progress.

Full Code

the blue part is the “new” clean-up routine.
To prevent contention of many threads trying to repair the same thing → add randomness.

remove ops:

CAS for marking
- if the CAS fails → retry from start.
otherwise, try to physically delete the element.
- If that fails, we just ignore it (logically deleted anyways)!

13.4 Lock-Free Unbounded Queue

This is a datastructure needed in the OS for example (see the BKL in Linux).

Operations the following is a locked implementation.

Problems in a lock-free version. Potentially simultaneous updates of

head (dequeue)
tail
tail.next (enqueue)

Fix? Sentinel value

→ still problems:

still have to update two pointers at a time
possible inconsistency
- tail might transiently not point to the last element
  a thread might have to wait until consistency is established → lock camouflaged
  solution: threads help making progress

13.4.1 Atomic Version

We add atomic references.

Protocol:

enqueue
- read tail into last
- then tries to set last.next
  - CAS(last.next, null, new)
- retry
- otherwise, try to set tail
  - without retry → “logically only” if it fails
dequeue
- read head into first (the sentinel)
- read first.next into next (next is the item to dequeue)
- if next is available, read the item value of next (the value we pop and return)
- CAS(head, first, next) (set head to the new “sentinel”)
- if unsuccessful, retry

enqueue’s CAS might fail because:

another thread pre-empted me
I read a stale tail
- missed update of another thread
- other thread failed in updating tail because it just died
  - that’s where the helping comes in → update tail for the other thread

Another Failure mode mixing enqueue and dequeue

Again: that’s where the helping comes in → update tail for the other thread

Final Solution: with helping other threads

if next is not null anymore → tail is not last element → we clean-up: update the tail to the new last.

in the edge-case of a $1$ element queue → check for tail pointing to dequeued element.

13.5 Recap

Difference between blocking vs. non-blocking:

blocking may wait indefinitely for another thread
non-blocking must not wait indefinitely ever! (so no locks)
- even without locks → we might get into a “soft-lock” situation if not careful

Use CAS:

the more data-items needed to keep in sync the harder
- need DCAS to update multiple values atomically
learned some tricks:
- fake DCAS (atomic, markable reference)
- helper principle
  - use logical and then physical deletion
  - the logical deletion only needs single atomic update
- sentinel values

Niklas @ ETHZ

Explorer

13. Lock-Free and Wait-Free Datastructures