Hacker News new | ask | show | jobs
by sebzim4500 1144 days ago
Ideally you'd be able to put your entire codebase + documentation + jira tickets + etc. into the context. I think there is no practical limit to how many tokens would be useful for users, so the limits imposed by the model (either hard limits or just pricing) will always be a bottleneck.
1 comments

I'm confused by this. Would you want to just include your codebase, documentation, etc. in some last-mile training? That way you don't need the expense of including huge amounts of context in every query. It's baked in.
I haven't tried this myself, but it is my understanding that finetuning does not work well in practice as a way of acquiring new knowledge.

There may be a middle ground between these two approaches though. If every query used the same prompt prefix (because you only update the codebase + docs occasionally) then you could put it into the model once and cache the keys and values from the attention heads. I wonder if OpenAI does this with whatever prefix they use for ChatGPT?

Yeah there's really three options here... Throw everything in context, fine tune, or add external search a la RETRO.

The latter is definitely the cheapest option; updates are trivial.

Yah... we really need some kind of architecture that juggles concept vectors around to external storage and does similarity search, etc, instead of forcing us to encode everything into giant tangles of coefficients.

GPT-4 seems to show that linear algebra definitely can do the job, but training is so expensive and the model gets so huge and inflexible.

It seems like having fixed format vectors of knowledge that the model can use-- denser and more precise than just incorporating tool results as tokens like OpenAI's plugin approach-- is a path forward towards extensibility and online learning.