Hacker News new | ask | show | jobs
by CamperBob2 317 days ago
That paper's 5 years old at this point, dating back to when Amodei was still an OpenAI employee. Has any newer work superseded it, or are those assumptions still considered solid?
1 comments

Those assumptions are still the same. Although now context length has increased more so the n^2 part is non negligible. See the repo for correct flop calculation[1]

[1]: https://github.com/facebookresearch/lingua/blob/437d680e5218...