Hacker News new | ask | show | jobs
by londons_explore 1179 days ago
Training compute goes up with approximately the 3rd power of the window size.

So turning a 4k window to a 32k window means a 512x increase in compute they'd need (just to maintain similar output quality).

I suspect they must have found a better solution to be able to scale the window so big. They haven't announced what it is.

1 comments

Very interesting, thanks