| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by umangsh 1197 days ago
	30B fp16 takes ~500 ms/token on M2 Max 96GB. Interestingly, that's the same performance as 65B q4 quantized. 65B fp16 is ungodly slow, ~300,000 ms/token on the same machine.