| HN Mirror

Training compute goes up with approximately the 3rd power of the window size.

So turning a 4k window to a 32k window means a 512x increase in compute they'd need (just to maintain similar output quality).

I suspect they must have found a better solution to be able to scale the window so big. They haven't announced what it is.