| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ata_aman 489 days ago
	Inference speed is heavily dependent on memory read/write speed versus size. As long as you can fit the model in memory, what’ll determine functionality is the mem bandwidth.

1 comments

menaerus 488 days ago

This is not universally true although I see this phrase being repeated here too often. And it is especially not true with the small models. Small models are compute-bound.

link