| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by pama 800 days ago
	Sure but the point of the comment was SRAM. There is some confusion in a subset of the ML people about hardware memories, their latencies, and bandwidths. We don’t all need to write kernels like Tri Dao to make transformers efficient on GPUs, but it would be great if more people were aware of the theoretical compute constraints of each type of model on a given hardware and then a subset of them worked towards building better pipelines.

1 comments

kkielhofner 800 days ago

Your parent comment (by my reading) implied the H100 "does just fine" when it has 50MB SRAM.

The reason Grok needs multiple racks of chips to serve up models that fit in a single H100 is because Grok chips are SRAM only while the H100 has 80GB of HBM VRAM bolted onto it in addition to SRAM.

link

pama 800 days ago

I see. You are right. I also don’t think grok would be friendly to the home user.

link