| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by almostgotcaught 454 days ago

> systolic array designs, an efficient type of hardware design for matrix multiplication (e.g., the Google TPU uses this), as opposed to more SIMD-like vector architectures like GPUs

this is wrong. TPUv4 has tensor cores just like NVIDIA has tensor cores just like AMD has tensor cores. no one uses a systolic array because bandwidth/connectivity is much scarcer than compute. the only people that keep talking about them are academics that don't actually fab/sell chips.

https://cloud.google.com/tpu/docs/v4

https://www.nvidia.com/en-us/data-center/tensor-cores/

https://rocm.docs.amd.com/projects/rocWMMA/en/latest/what-is...

ninja edit: before you gotcha me with "a tensor core is a systolic array!!!" - most tensor cores are actually outerproduct engines not riffle shuffle engines (or whatever you wanna call the topology corresponding to a systolic array).

2 comments

imtringued 454 days ago

https://cloud.google.com/tpu/docs/system-architecture-tpu-vm...

>The primary task for TPUs is matrix processing, which is a combination of multiply and accumulate operations. TPUs contain thousands of multiply-accumulators that are directly connected to each other to form a large physical matrix. This is called a systolic array architecture. Cloud TPU v3, contain two systolic arrays of 128 x 128 ALUs, on a single processor.

link

robinhouston 454 days ago

I don't see any contradiction between your claim that TPU v3 uses systolic arrays and the parent post's claim that TPU v4 does not.

link

FL33TW00D 454 days ago

The TPU obviously uses a systolic array: https://jax-ml.github.io/scaling-book/tpus/

link

almostgotcaught 453 days ago

Fair enough - my understanding was they moved away from systolic arrays. I stand corrected. I will also say it is well-known they're basically impossible to program/build a compiler for.

link

FL33TW00D 453 days ago

This is why Google has 500 people working on the TPU compiler team.

link