|
|
|
|
|
by OsamaJaber
63 days ago
|
|
Small models in the browser are a different optimization problem than small models on a server.
On server you chase throughput so you batch. In browser you're stuck at batch size 1, which means kernel launch overhead and memory bandwidth dominate, not FLOPs |
|