Hacker News new | ask | show | jobs
by eurekin 39 days ago
Batching lowers that, since the model is read once from memory. Activation accumulation doesn't scale as nicely