Hacker News new | ask | show | jobs
by almostgotcaught 454 days ago
> systolic array designs, an efficient type of hardware design for matrix multiplication (e.g., the Google TPU uses this), as opposed to more SIMD-like vector architectures like GPUs

this is wrong. TPUv4 has tensor cores just like NVIDIA has tensor cores just like AMD has tensor cores. no one uses a systolic array because bandwidth/connectivity is much scarcer than compute. the only people that keep talking about them are academics that don't actually fab/sell chips.

https://cloud.google.com/tpu/docs/v4

https://www.nvidia.com/en-us/data-center/tensor-cores/

https://rocm.docs.amd.com/projects/rocWMMA/en/latest/what-is...

ninja edit: before you gotcha me with "a tensor core is a systolic array!!!" - most tensor cores are actually outerproduct engines not riffle shuffle engines (or whatever you wanna call the topology corresponding to a systolic array).

2 comments

https://cloud.google.com/tpu/docs/system-architecture-tpu-vm...

>The primary task for TPUs is matrix processing, which is a combination of multiply and accumulate operations. TPUs contain thousands of multiply-accumulators that are directly connected to each other to form a large physical matrix. This is called a systolic array architecture. Cloud TPU v3, contain two systolic arrays of 128 x 128 ALUs, on a single processor.

I don't see any contradiction between your claim that TPU v3 uses systolic arrays and the parent post's claim that TPU v4 does not.
The TPU obviously uses a systolic array: https://jax-ml.github.io/scaling-book/tpus/
Fair enough - my understanding was they moved away from systolic arrays. I stand corrected. I will also say it is well-known they're basically impossible to program/build a compiler for.
This is why Google has 500 people working on the TPU compiler team.