Hacker News new | ask | show | jobs
by OsamaJaber 63 days ago
Small models in the browser are a different optimization problem than small models on a server. On server you chase throughput so you batch. In browser you're stuck at batch size 1, which means kernel launch overhead and memory bandwidth dominate, not FLOPs