| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by singhrac 279 days ago
	I think we might just disagree about how much of the GPU spend is on small vs large model (inference or training). I think it’s something like 99.9% of spending interest is on models that don’t fit into 128 GB (remember KV cache matters too). Happy to be proven wrong!