|
|
|
|
|
by londons_explore
1179 days ago
|
|
Training compute goes up with approximately the 3rd power of the window size. So turning a 4k window to a 32k window means a 512x increase in compute they'd need (just to maintain similar output quality). I suspect they must have found a better solution to be able to scale the window so big. They haven't announced what it is. |
|