Retrieval: augments the assistant with knowledge from outside our models, such as proprietary domain data, product information or documents provided by your users. This means you don’t need to compute and store embeddings for your documents, or implement chunking and search algorithms. The Assistants API optimizes what retrieval technique to use based on our experience building knowledge retrieval in ChatGPT.
The model then decides when to retrieve content based on the user Messages. The Assistants API automatically chooses between two retrieval techniques:
it either passes the file content in the prompt for short documents, or
performs a vector search for longer documents
Retrieval currently optimizes for quality by adding all relevant content to the context of model calls. We plan to introduce other retrieval strategies to enable developers to choose a different tradeoff between retrieval quality and model usage cost.
Really cool to see the Assistants API's nuanced document retrieval methods. Do you index over the text besides chunking it up and generating embeddings? I'm curious about the indexing and the depth of analysis for longer docs, like assessing an author's tone chapter by chapter—vector search might have its limits there. Plus, the process to shape user queries into retrievable embeddings seems complex. Eager to hear more about these strategies, at least what you can spill!
Embedding is poor man's context length increase. It essentially increases your context length but with loss.
There is a cost argument to make still, embedding-based approach will be cheaper and faster, but worse result than full text.
That being said, I don't see how those embedding startups compete with OpenAI, no one will be able to offer better embedding than OpenAI itself. It is hardly a convincing business.
The elephant in the room is the open source models aren't able to match up to OpenAI models, and it is qualitative, not quantitive.
For embeddings specifically, there are multiple open source models that outperform OpenAI’s best model (text-embedding-ada-002) that you can see on the MTEB Leaderboard [1]
> embedding-based approach will be cheaper and faster, but worse result than full text
I’m not sure results would be worse, I think it depends on the extent to which the models are able to ignore irrelevant context, which is a problem [2]. Using retrieval can come closer to providing only relevant context.
The point isn't about leaderboard. With increasing context length, the question is on whether we need embeddings or not. With longer context length, embeddings is no longer a necessity, and it lowers its value.
For more trivial use cases, sure, but not for harder stuff like working with US law and precedent.
The US Code is on the order of tens of millions of tokens and I shudder to think how many billions of tokens make up all the judicial opinions that set or interpreted precedent.
OP is incorrect. Embeddings are still needed since (1) context windows can't contain all data and (2) data memorization and continuous retraining is not yet viable.
But the common use case of using a vector DB to pull in augmentation appears to now be handled by the Assistants API. I haven't dug into the details yet but it appears you can upload files and the contents will be used (likely with some sort of vector searching happening behind the scenes).
The model then decides when to retrieve content based on the user Messages. The Assistants API automatically chooses between two retrieval techniques:
it either passes the file content in the prompt for short documents, or performs a vector search for longer documents Retrieval currently optimizes for quality by adding all relevant content to the context of model calls. We plan to introduce other retrieval strategies to enable developers to choose a different tradeoff between retrieval quality and model usage cost.