| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dragontamer 2040 days ago

Because bandwidth-optimized computers do more than just matrix-multiply all day long.

Tensor Processing Units are too specialized: they can't traverse a linked list, they can't traverse trees. They're good at one thing and one thing only: matrix multiplication.

GPUs are still bandwidth-optimized and are good at matrix multiplication (but not as good as tensor units). But GPUs can traverse trees and new data-structures. Ex: BVH trees for raytracing, or linked lists... or whatever else you need. Its a general computer, a weird... terrible latency computer with HUGE bandwidth... but that's still useful in many compute applications.

--------------

Matrix multiplication is the cornerstone of many scientific problems. But you still need software to manipulate the data into the correct "form", so that the matrix multiplication units can then process the data.

Its in this "preprocessing" or "postprocessing" phase where GPUs do best. You can implement bitonic sort for highly-parallel sorting / searching. You can perform GPU-accelerated join networks for SQL. Etc. etc.

And even then, NVidia's A100 have incredibly good matrix multiplication units. So you're really not losing much anyway.