Hacker News new | ask | show | jobs
by flumes_whims_ 21 days ago
The overhead shrinks with larger models. It doesn't seem that bad.

https://arxiv.org/pdf/2409.03992v2