Systolic Arrays are a very special purpose paradigm → designed for Matrix multiplication, Convolution, etc…
→ used in TPUs
Difference to pipelining
- systolic arrays can do multidimension array computations
- each PE (processing element) can execute a “kernel” not just one instruction
The challenge is to orchestrate the memory in order to have the data ready
- need to carefully place it into the inputs

We can also arrange the PEs in other structures (triangular, hex cells, etc…) for different tasks.
Pros:
- efficient use of limited memory
- specialised (computation needs to fit the PE organization/functions)
Cons:
- specialised → not generally applicable
19.2 General Systolic Computational Model
We can string a series of PEs after each other that can be “more general PEs”.
This allows less specialised computation runs.
Basically instead of writing back to memory after each instruction like in a normal CPU, each PE just passes the output forward to the next one.
Less energy is wasted on data movement.
Needs a specialised compiler to optimise.