Hacker News new | ask | show | jobs
by MichaelZuo 1144 days ago
Considering that increasing context length is O(n^2), and that current 8k GPT-4 is already restricted to 25 prompts/3 hours, I think they will launch it at substantially higher pricing.
5 comments

> current 8k GPT-4 is already restricted to 25 prompts/3 hours

I'm pretty sure they're using a 4k GPT-4 model for ChatGPT Plus, even though they only announced 8k and 32k... It can't handle more than 4k of tokens (actually a little below that, starts ignoring your last few sentences if you get close). If you check developer tools, the request to an API /models endpoint says the limit for GPT-4 is 4096. It's very unfortunate.

Ah this explains a lot. I couldn't understand why I couldn't get close to the ~12 pages that everyone was saying 8,000 tokens implied.
As far as I know it's not documented anywhere and there is no way to ask the team at ChatGPT questions. I sent them an email about it a few days after GPT-4 release and still haven't received a reply.

Another thing that annoys me is how most updates don't get a changelog entry. For whatever reason, they keep little secrets like that.

Their PR is terrible and I get the impression that they wish their own users would “just go away”.

Every time I see a company act like this, more responsive and truly open competition eventually eats their lunch.

The raw chat log has the system message on top, plus "user:" and "assistant:" for each message, and im_start/im_end tokens to separate messages, hence why the visible chat context is slightly under 4k.
Your second link has the immediate comment "Gpt3 includes dense attention layers that are n^2". So it's not at all unlikely.
GPT3 was released 3 years ago now. There have been major advancements in scaling attention so it would be strange if they didn't use some of them
It doesn't matter how many major advancements they made in scaling, as long as one component is O(n^2) or higher.
It's not the scale itself, it's the scaling architecture.
The same applies.
It will be interesting to see how far this quadratic algorithm carries in practice. Even the longest documents can only have hundreds of thousands of tokens, right?
Ideally you'd be able to put your entire codebase + documentation + jira tickets + etc. into the context. I think there is no practical limit to how many tokens would be useful for users, so the limits imposed by the model (either hard limits or just pricing) will always be a bottleneck.
I'm confused by this. Would you want to just include your codebase, documentation, etc. in some last-mile training? That way you don't need the expense of including huge amounts of context in every query. It's baked in.
I haven't tried this myself, but it is my understanding that finetuning does not work well in practice as a way of acquiring new knowledge.

There may be a middle ground between these two approaches though. If every query used the same prompt prefix (because you only update the codebase + docs occasionally) then you could put it into the model once and cache the keys and values from the attention heads. I wonder if OpenAI does this with whatever prefix they use for ChatGPT?

Yeah there's really three options here... Throw everything in context, fine tune, or add external search a la RETRO.

The latter is definitely the cheapest option; updates are trivial.

Yah... we really need some kind of architecture that juggles concept vectors around to external storage and does similarity search, etc, instead of forcing us to encode everything into giant tangles of coefficients.

GPT-4 seems to show that linear algebra definitely can do the job, but training is so expensive and the model gets so huge and inflexible.

It seems like having fixed format vectors of knowledge that the model can use-- denser and more precise than just incorporating tool results as tokens like OpenAI's plugin approach-- is a path forward towards extensibility and online learning.

some of the context length will be lost to waste spent on truncated posts, or are replies not considered part of context on ChatGPT? In both cases, might be worth designing a prompt, every so often, to get a reply with which to re-establish the context, thus compressing it.
It’s been available on Azure in preview. Pricing is double the 8K model.