| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by OsamaJaber 63 days ago
	Small models in the browser are a different optimization problem than small models on a server. On server you chase throughput so you batch. In browser you're stuck at batch size 1, which means kernel launch overhead and memory bandwidth dominate, not FLOPs