The software-only lock algorithms (Peterson, Filter, Bakery) require shared variables and rely on no memory reordering (see 8.2 Locks).
8.1 Atomic Operations
Modern hardware instead provides read-modify-write instructions that operate atomically on a single memory location, enabling locks with space.
8.1.2 TAS and CAS
Test-And-Set (TAS)
Atomically reads a memory location and sets it to 1, returning the old value:
boolean TAS(memref s): if mem[s] == 0: mem[s] = 1; return true else: return falseA spinlock spins calling
TASuntil it returnstrue(lock was free).
Compare-And-Swap (CAS)
Atomically compares a memory location to an expected value and, on match, writes a new value:
int CAS(memref a, int cmp, int new): old = mem[a] if cmp == old: mem[a] = new return oldCAS succeeds iff it returns
cmp. Used aswhile (CAS(lock, 0, 1) != 0)to acquire.
Both TAS and CAS are much slower than plain reads/writes — they require exclusive cache-line ownership and generate bus traffic on every call.
Note, that both TAS and CAS can be used to implement locking for a ressource (by using 1 as a “occupied” signal).
8.1.2 TAS Lock vs. TATAS Lock
Spinlock
A lock implemented using these instructions retries until it succeeds, “spinning” in place. Thus it’s called a spinlock.
A naive TAS spinlock hammers the bus with atomic operations even while the lock is held, causing every waiting thread to invalidate cached copies on all other cores.

The Test-and-Test-and-Set (TATAS) lock avoids this by first spinning on a plain get() (cache-friendly read), only attempting the expensive CAS when the lock appears free:
do {
while (state.get()) {} // spin locally on cached value
} while (!state.compareAndSet(false, true)); // atomic attemptThis drastically reduces bus contention, but on lock release all waiting threads still simultaneously attempt CAS, causing a thundering herd.
8.1.3 Exponential Backoff
After a failed CAS attempt, a thread sleeps for a random duration and doubles the expected wait on each retry (up to MAX_DELAY). This spreads contention over time and approaches per-thread cost empirically.
Backoff beats TATAS beats TAS (empirically)
Under high contention, the per-thread lock-acquire time scales as:
Backoff latency is nearly flat with thread count; TAS latency grows roughly linearly.

8.2 Atomics in Java
Java exposes hardware RMW operations via java.util.concurrent.atomic.*:
AtomicBoolean/AtomicInteger— single-variable atomicsAtomicIntegerArray— per-element atomic access (needed for Peterson/Filter/Bakery)- Key methods:
get(),set(v),getAndSet(v)(≈ TAS),compareAndSet(expect, update)(≈ CAS)
volatilearraysDeclaring
volatile boolean[]only makes the array reference volatile — individual element accesses are not atomic. UseAtomicIntegerArrayinstead.
Internally, compareAndSet delegates to sun.misc.Unsafe.compareAndSwapInt, which maps to the hardware CAS instruction (LOCK CMPXCHG on x86, LDREX/STREX on ARM).
This is not guaranteed to be lock-free at the JVM spec level, but is in practice on all modern JVMs.