| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by halflings 810 days ago
	No HBM because they use tons of fast SRAM instead. Isn't that the main driver for performance here? (the way I understood it => it's still cost effective at scale due to throughput increase this brings)

2 comments

gandalfgeek 810 days ago

> No HBM because they use tons of fast SRAM instead. Isn't that the main driver for performance here?

No doubt fast SRAM helps, but from a computation pov imho its that they've statically planned computation and eliminated all locks.

Short explainer here: https://www.youtube.com/watch?v=H77tV1KcWIE (Based on their paper).

link

halflings 808 days ago

Thanks for putting this together! Will give it a watch now

link

germanjoey 810 days ago

cost effective in what sense? groq doesn't achieve high efficiency, only low latency. but that's not done in a cost-effective way. compare sambanova achieving the same performance with 8 chips instead of 568, and with higher precision.

link

halflings 808 days ago

The # of chips is not the most important metric.

Most important, even ignoring latency, is throughput (tokens) per $$$. And according to their own benchmark [1] (famous last words :)), they're quite cost efficient.

[1] https://www.semianalysis.com/p/groq-inference-tokenomics-spe...

link