17. Transactional Memory

17.1 Motivation for STM

17.1.1 What’s wrong with Locks

Convoying: Threads pile up while waiting for a lock held by a descheduled thread.

Priority Inversion: High-priority thread is blocked by a low-priority thread holding a needed lock.

Could solve by giving the thread holding the lock the priority of the thread with the highest priority waiting for it.

Lack of Composability: combining independently developed thread-safe modules using locks is difficult and error-prone
→ deadlocks from unforeseen interactions → you need to understand internal locking details.

17.1.2 What’s wrong with Lock-Free (using CAS)

Complexity: Lock-free algorithms are subtle and hard to prove correct
Atomicity illusion operations often need multiple CAS steps

need tricks (like the head/tail placeholders)
DCAS
ABA Problem

→ having a multi-compare-and-set operation would solve our issues
this shows we need a “high-level” atomic operations.

It could look like this:

17.1.3 Bank Account Transfer

Problem 1: not atomic, could be descheduled and have stale value

Problem 2: deadlock due to wrong order of locking

→ need to lock in account id order.
But this is hard to keep consistent, etc… What if there were more than 2 accounts, etc…

We always need to know in which order to lock, etc… When we call transfer in a higher level piece of code that also uses locks we need to be careful! → not composable

17.1.5 Solution?

17.2 Transactional Memory

The programmer explicitly defines atomic code sections.
The programmer is then concerned with:

what: what operations should be atomic
we don’t want to have to handle the “how” → left to the system

This is a declarative approach.

Transactional Memory is:

simpler and less error-prone
higher-level (declarative) semantics
composable
optimistic by design (does not require mutual exclusion)
- we only pay a cost when there actually is contention, not like with locks

TM executes transactions atomically.

It does this by guaranteeing isolation.

TM guarantees us:

Atomicity changes made by a transaction become visible all at once → on commit
- we cannot see intermediate states
- not achieved using mutual exclusion, like locks do it
Isolation transactions execute as if they were running alone → snapshot of the state before.
Serializability the concurrent execution of multiple transactions must be equivalent to some sequential execution
- final outcome is consistant with a serial ordering
- otherwise we’d be violating atomicity and isolation…

Note: these properties are inspired by the ACID principles of databases:

atomicity
consistency
isolation
(durability)

17.2.1 How is TM implemented

Using a Big Lock is not:

optimistic
composable

17.2.2 Conflicts

Thread A started, but didn’t finish the transaction. It already read in x = a.
Then thread B starts, writes a = 10 and commits.
Not that needs to be visible to all threads!

So whatever A did cannot occur after B’s commit in the serialization order!
Otherwise we’d have a magically changing value…

The solution is the concurrency control (CC) mechanism.

The CC mechanism must detect the conflict of A having read from a location subsequently modified and committed by B.
→ it must then abort A’s transaction and make it retry.

Example Bank Account

A cannot read b = 95 → otherwise isolation is violated!
So either it must read a = 105 AND b = 95 or not see the transaction by B at all.

Example

We must immediately abort the transaction as soon as something is inconsistent → otherwise we can have exceptions which could never occur in isolation.

17.2.5 Consistency Guarantee

17.3 Design Choices for STM

STM: Implemented purely in software, often as a library or language feature.

Pros: Greater flexibility, not limited by hardware resources (can handle large transactions).
Cons: Can have significant performance overhead (instrumentation, logging reads/writes). Achieving good performance is challenging.

Hardware TM: Implemented directly in processor hardware (e.g., using cache coherence protocol extensions to track read/write sets).

Pros: Can be very fast, low overhead for common cases.
Cons: Limited by hardware resources (cache size, buffer sizes). Often cannot handle large transactions that exceed these bounds. Aborts can be expensive.
Examples: Intel Haswell (TSX/RTM - largely deprecated/disabled due to security flaws), IBM Blue Gene/Q, Sun Rock (cancelled).

17.3.1 Strong vs. Weak Isolation

Strong Isolation: if I modify a variable that is also accessed inside a transaction outside?

does it still guarantee that it’s serializable, isolated, etc…?
easier to port code
harder for the TM to implement

Or (weak isolation), we assume all accessed to potentially shared data happens inside transactions

requires programmer discipline.
Scala-STM does this.

17.3.2 Nesting

How do we handle nested atomic transactions. This is necessary for composable transactions!

Flattened

This is simpler, but limits modularity.
Transaction sizes grow → when aborted everything aborted…

Closed

Inner transactions can abort without everything aborting.
Only when outer transaction commits does everything inside get commited globally.

17.3.3 Scope

What variables are tracked by the TM implementation?

all variables?
- easier to port
- high overhead for TM system
reference-based
- only special “transactional memory types” are tracked
- explicitly marks mutable shared state
- requires programmer discipline.
- hard to port have to change all variable types

We can isolate special variables that are immutable outside of a transaction.

17.4 Scala STM

17.4.1 Bank Account Example

This is how we wish STM looked with compiler support:

In ScalaSTM, we get the following syntax:

GetBalance:

Transfer Function

What if account a does not have enough funds?
When something goes wrong, we can use STM.retry() to redo the transaction.

This automatically works because we already have a system to detect changes in variables!

retry is like wait()
if the variables read/modified before retry() changed →
- abort
- put to sleep
- calls again if something changes

Retry in ScalaSTM

The retry() function is like wait() with locks.
It waits until previously read/modified variables change and then calls again.

How does this work internally?

It keeps a set of all variables that were read or written (read-set and write-set)

keeps track → potentially very large and expensive

17.5 Simple STM Implementation

We have the following things:

Threads: Have states (active, aborted, committed).
Transactional Objects: Represent shared state, support read (get), write (set), and crucially, versioning or copying.
Concurrency Control: Needed to ensure isolation and atomicity.

17.5.1 Clock-Based STM System

STM are typically based on a “global clock”.

this is not a physical time (= wall clock)
more like an incrementing counter (= logical clock)

is increased everytime the system commits a transaction.

The clock based system works like this:

A global version clock advances on every commit.
Each transaction gets a birthdate (clock value when it starts).
Each transactional object has a timestamp (clock value when it was last committed).
Transactions maintain local read-sets and write-sets.

**Read obj:

Check thread’s local write-set first. If found, return buffered value.
- this gives us isolation
Otherwise, check obj.timestamp <= tx.birthdate. If not, the object was modified after the transaction started → conflict → abort transaction.
If timestamp is okay, read the object’s committed value
- add obj to the read-set.

Write obj:

If obj not in write-set
- create a private copy in the write-set (write buffering).
From now on: update the private copy.

Example:

In this example, Z’s timestamp $>$ T’s birthdate, thus we must abort → this is because Z was written after T started

serializability violated

Commit

Acquire commit lock (simplest CC)
Validate read-set:
- For every obj in read-set, check obj.timestamp <= tx.birthdate again
  - → checks for conflicts that occurred during execution
- If validation fails, abort.
If valid, increment global clock.
Update timestamps of all objects in the write-set to the new clock value.
Write buffered values back to main memory
Release commit lock.

There are two possible execution paths:

Successful commit
Because we don’t modify / read Z, we can commit without issue.
Aborted commitThe issue with the timestamp of Z $>$ birthdate is detected during the commit.