Hacker News new | ask | show | jobs
by evergreener 1277 days ago
Is it known to anyone how OpenAI (and others) are extending the context windows of things like ChatGPT so far? E.g. if you exceed 2048/8192 (subword) tokens, does the model just chunk the inputs and evaluate separately on the chunks? Is context/state maintained across chunks? I've never seen anyone actually explain this.
3 comments

https://help.openai.com/en/articles/6787051-does-chatgpt-rem...

> While ChatGPT is able to remember what the user has said earlier in the conversation, there is a limit to how much information it can retain. The model is able to reference up to approximately 3000 words (or 4000 tokens) from the current conversation - any information beyond that is not stored.

This implies ChatGPT has a 4000 token maximum prompt and prior prompts in a given web session are inserted into the current prompt, most recent to oldest (probably with some sort of time context like "previously, user asked:"), up to 4000 tokens.

I've had longer discussions but I'm realising that I often ask for a summary, which would mean the model has a summary of the conversation so far in the window.
What's the technical limit? The width of the attention layer?
I have been playing with their completions API. I just keep track of previous conversational lines, and when I approach a configurable token threshold ( input+output must fit within the given number of tokens, and it returns it each time ), I just instruct chatgpt to summarize the conversation thus far with additional specific instructions to help it keep useful bits of context. I then make that summary part of the context I send in along with kept and future conversational lines.

Their API calls on the site have references to previous message ids, which makes me expect they're doing something similar.

I wonder if they tack on a neural network with a larger input set as preprocessing with convolutions or something beforehand, and pass those into the inputs of the large model. It’s something I’d try but I have no idea if that’s what they’re doing.