The software-only lock algorithms (Peterson, Filter, Bakery) require shared variables and rely on no memory reordering (see 8.2 Locks).

8.1 Atomic Operations

Modern hardware instead provides read-modify-write instructions that operate atomically on a single memory location, enabling locks with space.

8.1.2 TAS and CAS

Test-And-Set (TAS)

Atomically reads a memory location and sets it to 1, returning the old value:

boolean TAS(memref s):
    if mem[s] == 0: mem[s] = 1; return true
    else:           return false

A spinlock spins calling TAS until it returns true (lock was free).

Compare-And-Swap (CAS)

Atomically compares a memory location to an expected value and, on match, writes a new value:

int CAS(memref a, int cmp, int new):
    old = mem[a]
    if cmp == old: mem[a] = new
    return old

CAS succeeds iff it returns cmp. Used as while (CAS(lock, 0, 1) != 0) to acquire.

Both TAS and CAS are much slower than plain reads/writes — they require exclusive cache-line ownership and generate bus traffic on every call.

Note, that both TAS and CAS can be used to implement locking for a ressource (by using 1 as a “occupied” signal).

8.1.2 TAS Lock vs. TATAS Lock

Spinlock

A lock implemented using these instructions retries until it succeeds, “spinning” in place. Thus it’s called a spinlock.

A naive TAS spinlock hammers the bus with atomic operations even while the lock is held, causing every waiting thread to invalidate cached copies on all other cores.

The Test-and-Test-and-Set (TATAS) lock avoids this by first spinning on a plain get() (cache-friendly read), only attempting the expensive CAS when the lock appears free:

do {
    while (state.get()) {}          // spin locally on cached value
} while (!state.compareAndSet(false, true));  // atomic attempt

This drastically reduces bus contention, but on lock release all waiting threads still simultaneously attempt CAS, causing a thundering herd.

8.1.3 Exponential Backoff

After a failed CAS attempt, a thread sleeps for a random duration and doubles the expected wait on each retry (up to MAX_DELAY). This spreads contention over time and approaches per-thread cost empirically.

Backoff beats TATAS beats TAS (empirically)

Under high contention, the per-thread lock-acquire time scales as:

Backoff latency is nearly flat with thread count; TAS latency grows roughly linearly.

8.2 Atomics in Java

Java exposes hardware RMW operations via java.util.concurrent.atomic.*:

  • AtomicBoolean / AtomicInteger — single-variable atomics
  • AtomicIntegerArray — per-element atomic access (needed for Peterson/Filter/Bakery)
  • Key methods: get(), set(v), getAndSet(v) (≈ TAS), compareAndSet(expect, update) (≈ CAS)

volatile arrays

Declaring volatile boolean[] only makes the array reference volatile — individual element accesses are not atomic. Use AtomicIntegerArray instead.

Internally, compareAndSet delegates to sun.misc.Unsafe.compareAndSwapInt, which maps to the hardware CAS instruction (LOCK CMPXCHG on x86, LDREX/STREX on ARM).
This is not guaranteed to be lock-free at the JVM spec level, but is in practice on all modern JVMs.