| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by MichaelZuo 1144 days ago
	Considering that increasing context length is O(n^2), and that current 8k GPT-4 is already restricted to 25 prompts/3 hours, I think they will launch it at substantially higher pricing.

5 comments

tempaccount420 1144 days ago

> current 8k GPT-4 is already restricted to 25 prompts/3 hours

I'm pretty sure they're using a 4k GPT-4 model for ChatGPT Plus, even though they only announced 8k and 32k... It can't handle more than 4k of tokens (actually a little below that, starts ignoring your last few sentences if you get close). If you check developer tools, the request to an API /models endpoint says the limit for GPT-4 is 4096. It's very unfortunate.

reaperman 1144 days ago

Ah this explains a lot. I couldn't understand why I couldn't get close to the ~12 pages that everyone was saying 8,000 tokens implied.

tempaccount420 1144 days ago

As far as I know it's not documented anywhere and there is no way to ask the team at ChatGPT questions. I sent them an email about it a few days after GPT-4 release and still haven't received a reply.

Another thing that annoys me is how most updates don't get a changelog entry. For whatever reason, they keep little secrets like that.

jiggawatts 1143 days ago

Their PR is terrible and I get the impression that they wish their own users would “just go away”.

Every time I see a company act like this, more responsive and truly open competition eventually eats their lunch.

int_19h 1144 days ago

The raw chat log has the system message on top, plus "user:" and "assistant:" for each message, and im_start/im_end tokens to separate messages, hence why the visible chat context is slightly under 4k.

cubefox 1144 days ago

O(n^2) seems unlikely:

https://cognitiverevolution.substack.com/p/openais-foundry-l....

https://news.ycombinator.com/item?id=34977194#:~:text=Sparse...

MichaelZuo 1144 days ago

Your second link has the immediate comment "Gpt3 includes dense attention layers that are n^2". So it's not at all unlikely.

space_fountain 1144 days ago

GPT3 was released 3 years ago now. There have been major advancements in scaling attention so it would be strange if they didn't use some of them

MichaelZuo 1142 days ago

It doesn't matter how many major advancements they made in scaling, as long as one component is O(n^2) or higher.

cubefox 1142 days ago

It's not the scale itself, it's the scaling architecture.

MichaelZuo 1142 days ago

The same applies.

choeger 1144 days ago

It will be interesting to see how far this quadratic algorithm carries in practice. Even the longest documents can only have hundreds of thousands of tokens, right?

sebzim4500 1144 days ago

Ideally you'd be able to put your entire codebase + documentation + jira tickets + etc. into the context. I think there is no practical limit to how many tokens would be useful for users, so the limits imposed by the model (either hard limits or just pricing) will always be a bottleneck.

jtbayly 1144 days ago

I'm confused by this. Would you want to just include your codebase, documentation, etc. in some last-mile training? That way you don't need the expense of including huge amounts of context in every query. It's baked in.

sebzim4500 1144 days ago

I haven't tried this myself, but it is my understanding that finetuning does not work well in practice as a way of acquiring new knowledge.

There may be a middle ground between these two approaches though. If every query used the same prompt prefix (because you only update the codebase + docs occasionally) then you could put it into the model once and cache the keys and values from the attention heads. I wonder if OpenAI does this with whatever prefix they use for ChatGPT?

sdenton4 1144 days ago

Yeah there's really three options here... Throw everything in context, fine tune, or add external search a la RETRO.

The latter is definitely the cheapest option; updates are trivial.

mlyle 1144 days ago

Yah... we really need some kind of architecture that juggles concept vectors around to external storage and does similarity search, etc, instead of forcing us to encode everything into giant tangles of coefficients.

GPT-4 seems to show that linear algebra definitely can do the job, but training is so expensive and the model gets so huge and inflexible.

It seems like having fixed format vectors of knowledge that the model can use-- denser and more precise than just incorporating tool results as tokens like OpenAI's plugin approach-- is a path forward towards extensibility and online learning.

Keyframe 1144 days ago

some of the context length will be lost to waste spent on truncated posts, or are replies not considered part of context on ChatGPT? In both cases, might be worth designing a prompt, every so often, to get a reply with which to re-establish the context, thus compressing it.

totoglazer 1144 days ago

It’s been available on Azure in preview. Pricing is double the 8K model.