Hacker News new | ask | show | jobs
by londons_explore 1179 days ago
I suspect with a 'window' of 32k tokens, OpenAI has already done similar memory tricks.

I suspect that if you filled the context window with "1 1 1 1 1 1 1 1 1 1", and then asked "How many 1's did I just show you?", it probably wouldn't know, simply because whatever tricks they use to have such an apparently large context window don't allow it to 'see' all of it at any given moment.

1 comments

Ah so you think the 32k context window works differently than eg the 4k davinci context window? They didnt just increase ${hyperparam}?
Training compute goes up with approximately the 3rd power of the window size.

So turning a 4k window to a 32k window means a 512x increase in compute they'd need (just to maintain similar output quality).

I suspect they must have found a better solution to be able to scale the window so big. They haven't announced what it is.

Very interesting, thanks