Hacker News new | ask | show | jobs
by Const-me 941 days ago
> How did we overcome the HBM <-> SRAM bottleneck?

Because every number we load from the model through that bottleneck gets reused, to compute different requests within the batch.