Recap We saw spinlocks vs. scheduled locks (wait, await)
Now we’ll do lock-free programming/datastructures

Spinlock problems:

  • scheduling fairness no FIFO behaviour
  • computing resources wasted, performance degradation
  • no notification mechanism

Locks with waiting problems: (semaphores, mutexes, monitors often implemented with such a lock)

  • require OS support
  • datastructures need to be protected against concurrent access
    • we do this using spinlocks
  • higher wakeup latency scheduler involved
  • if thread is delayed when in CS all threads suffer
  • what if thread dies in critical section
    • prone to deadlocks
  • locks cannot be used in interrupt handlers… (who handles the interrupt, in the interrupt handler)

13.1 Lock-Free Programming

Lock-freedom: (= deadlock free) at least one thread always makes progress even if other threads run concurrently

  • implies system-wide progress but not freedom from starvation
    Wait-freedom: (=starvation-free) all threads eventually make progress
  • implies freedom from starvation

Wait-freedom (stronger!) lock-freedom

We can see the mapping between the concepts names in the two programming paradigm:

Non-blocking algorithms: locks/blocking a thread can infinitely delay another thread

  • non-blocking = failure/suspension of one thread cannot cause failure or suspension of another thread.

Note, in Java the following holds:

ReentrantLock

We can assume the ReentrantLock is deadlock-free.
If we initialise ReentrantLock(true), then it’s also starvation-free, since the fairness-parameter is set to true.

To program lock-free we use atomic operations.
CAS: can be implemented lock-free in hardware

  • usually even wait-free (but depends on the java compiler for example)

13.1.2 Lock-Free Counter

Use CAS and redo until the counter was incremented by us.

public class CasCounter {
    private AtomicInteger value;
 
    public int getVal() {
        return value.get();
    }
 
    public int inc() {
        int v;
        do {
            v = value.get();
        } while (!value.compareAndSet(v, v + 1));
        return value;
    }
}

(usually in java we could just use .increment …)

This is a lock-free but not wait-free implementation it doesn’t need to terminate in finite steps.

  • that is because of the while loop
  • can be infinitely looping in worst-case

Positive result of CAS suggests that no other thread has written between .get and compareAndSet

  • we read a
  • other thread could have written b in between
  • other thread modifies it again to a
  • we compareAndSet returns
    ABA problem

13.1.3 DCAS

Double compare and swap

It allows performing a CAS on two memory locations at the same time.

Primitive (CPU operation) that can be used to implement software transactional memory (STM).

13.2 Lock-Free Stack

For a blocking stack, we can just use synchronized (global lock).

Non-blocking Stack implementation:

we use an atomic reference to the top element.

To pop/push, we:

  • save current top element as a local variable
  • for add: set our new elements .next
  • try to CAS the new element into the top
    if it fails retry

Pros:

  • Since this is lock-free deadlock-free by design.
  • slightly more performant in a real-world test

Fix we can use backoff to make the atomic operations faster prevent contention.

13.3 Lock-Free List Set

(same thing as before, but we try to make it lock free again)

We use CAS to switch the pointers:

works CAS decides who “wins” and completes the operation

Problems: Same issue as before

we fixed this before by using mark-bits. Let’s see if we can fix it the same way:

since B can be interrupted now (with locks we couldn’t have accessed c or d)
A can complete before B can corrupts the state

This is a fundamental problem we can’t “atomically” update / check both atomic variables.
want to atomically establish consistency of two things

13.3.1 AtomicMarkableReference

The solution to our problem from before (update / do two things “at once”) is an AtomicMarkableReference.

This works since we can just use a “free” bit in the address pointer.

  • In a 64-bit system, the addressable memory space is so large that removing a few bits doesn’t change anything (nobody has 9 trillion petabytes…).

Java provides AtomicMarkableReference<V> to address this. It atomically manages a reference V and a single boolean mark.

  • compareAndSet(expectedRef, newRef, expectedMark, newMark): Atomically sets the reference to newRef and the mark to newMark if and only if the current reference is expectedRef AND the current mark is expectedMark.
  • attemptMark(expectedRef, newMark): Atomically sets the mark to newMark if the current reference is expectedRef.
  • Other methods: get(), getReference(), isMarked(), set().

This effectively provides a way to perform a Double CAS (DCAS) on the reference and the mark bit together.

13.3.2 Lock-free List-Set Stack with AtomicMarkableReference

remove:

  • logical delete try to set mark on c (set mark on c.next as that’s where we store the mark)
    • this could fail if someone inserted after c for example
    • or concurrent remove
      retraverse and retry
  • physical delete: we atomically set b.next=(c, false) ( that means next is c and it’s not marked) to (d, false) via CAS

This prevents the previous failure mode!

remove(c) fails because it’s marked itself now still logically deleted so it’s fine.

Clean-up the clean-up is done when you find a marked node in your path CAS the previous node’s .next to the one after.
any thread can do this!
the “helping” is a classic pattern in lock-free programming to help all threads make progress.

Full Code

the blue part is the “new” clean-up routine.
To prevent contention of many threads trying to repair the same thing add randomness.

remove ops:

  • CAS for marking
    • if the CAS fails retry from start.
  • otherwise, try to physically delete the element.
    • If that fails, we just ignore it (logically deleted anyways)!

13.4 Lock-Free Unbounded Queue

This is a datastructure needed in the OS for example (see the BKL in Linux).

Operations the following is a locked implementation.

Problems in a lock-free version. Potentially simultaneous updates of

  • head (dequeue)
  • tail
  • tail.next (enqueue)

Fix? Sentinel value

still problems:

  • still have to update two pointers at a time
  • possible inconsistency
    • tail might transiently not point to the last element
      a thread might have to wait until consistency is established lock camouflaged
      solution: threads help making progress

13.4.1 Atomic Version

We add atomic references.

Protocol:

  • enqueue
    • read tail into last
    • then tries to set last.next
      • CAS(last.next, null, new)
    • retry
    • otherwise, try to set tail
      • without retry “logically only” if it fails
  • dequeue
    • read head into first (the sentinel)
    • read first.next into next (next is the item to dequeue)
    • if next is available, read the item value of next (the value we pop and return)
    • CAS(head, first, next) (set head to the new “sentinel”)
    • if unsuccessful, retry

enqueue’s CAS might fail because:

  • another thread pre-empted me
  • I read a stale tail
    • missed update of another thread
    • other thread failed in updating tail because it just died
      • that’s where the helping comes in update tail for the other thread

Another Failure mode mixing enqueue and dequeue

Again: that’s where the helping comes in update tail for the other thread

Final Solution: with helping other threads

if next is not null anymore tail is not last element we clean-up: update the tail to the new last.

in the edge-case of a element queue check for tail pointing to dequeued element.

13.5 Recap

Difference between blocking vs. non-blocking:

  • blocking may wait indefinitely for another thread
  • non-blocking must not wait indefinitely ever! (so no locks)
    • even without locks we might get into a “soft-lock” situation if not careful

Use CAS:

  • the more data-items needed to keep in sync the harder
    • need DCAS to update multiple values atomically
  • learned some tricks:
    • fake DCAS (atomic, markable reference)
    • helper principle
      • use logical and then physical deletion
      • the logical deletion only needs single atomic update
    • sentinel values