That paper's 5 years old at this point, dating back to when Amodei was still an OpenAI employee. Has any newer work superseded it, or are those assumptions still considered solid?
Those assumptions are still the same. Although now context length has increased more so the n^2 part is non negligible. See the repo for correct flop calculation[1]
[1]: https://github.com/facebookresearch/lingua/blob/437d680e5218...