Hacker News new | ask | show | jobs
by YetAnotherNick 316 days ago
Those assumptions are still the same. Although now context length has increased more so the n^2 part is non negligible. See the repo for correct flop calculation[1]

[1]: https://github.com/facebookresearch/lingua/blob/437d680e5218...