| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Const-me 941 days ago
	> How did we overcome the HBM <-> SRAM bottleneck? Because every number we load from the model through that bottleneck gets reused, to compute different requests within the batch.