| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by evergreener 1324 days ago
	Is it known to anyone how OpenAI (and others) are extending the context windows of things like ChatGPT so far? E.g. if you exceed 2048/8192 (subword) tokens, does the model just chunk the inputs and evaluate separately on the chunks? Is context/state maintained across chunks? I've never seen anyone actually explain this.

3 comments

tshadley 1324 days ago

https://help.openai.com/en/articles/6787051-does-chatgpt-rem...

> While ChatGPT is able to remember what the user has said earlier in the conversation, there is a limit to how much information it can retain. The model is able to reference up to approximately 3000 words (or 4000 tokens) from the current conversation - any information beyond that is not stored.

This implies ChatGPT has a 4000 token maximum prompt and prior prompts in a given web session are inserted into the current prompt, most recent to oldest (probably with some sort of time context like "previously, user asked:"), up to 4000 tokens.

link

IanCal 1324 days ago

I've had longer discussions but I'm realising that I often ask for a summary, which would mean the model has a summary of the conversation so far in the window.

link

melony 1324 days ago

What's the technical limit? The width of the attention layer?

link

knome 1324 days ago

I have been playing with their completions API. I just keep track of previous conversational lines, and when I approach a configurable token threshold ( input+output must fit within the given number of tokens, and it returns it each time ), I just instruct chatgpt to summarize the conversation thus far with additional specific instructions to help it keep useful bits of context. I then make that summary part of the context I send in along with kept and future conversational lines.

Their API calls on the site have references to previous message ids, which makes me expect they're doing something similar.

link

binarymax 1324 days ago

I wonder if they tack on a neural network with a larger input set as preprocessing with convolutions or something beforehand, and pass those into the inputs of the large model. It’s something I’d try but I have no idea if that’s what they’re doing.

link