Y
Hacker News
new
|
ask
|
show
|
jobs
by
YetAnotherNick
316 days ago
Those assumptions are still the same. Although now context length has increased more so the n^2 part is non negligible. See the repo for correct flop calculation[1]
[1]:
https://github.com/facebookresearch/lingua/blob/437d680e5218...