| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by colordrops 954 days ago
	I haven't been paying attention, why are embeddings not needed anymore?

4 comments

sharemywin 954 days ago

Retrieval: augments the assistant with knowledge from outside our models, such as proprietary domain data, product information or documents provided by your users. This means you don’t need to compute and store embeddings for your documents, or implement chunking and search algorithms. The Assistants API optimizes what retrieval technique to use based on our experience building knowledge retrieval in ChatGPT.

The model then decides when to retrieve content based on the user Messages. The Assistants API automatically chooses between two retrieval techniques:

it either passes the file content in the prompt for short documents, or performs a vector search for longer documents Retrieval currently optimizes for quality by adding all relevant content to the context of model calls. We plan to introduce other retrieval strategies to enable developers to choose a different tradeoff between retrieval quality and model usage cost.

link

sjnair96 954 days ago

Really cool to see the Assistants API's nuanced document retrieval methods. Do you index over the text besides chunking it up and generating embeddings? I'm curious about the indexing and the depth of analysis for longer docs, like assessing an author's tone chapter by chapter—vector search might have its limits there. Plus, the process to shape user queries into retrievable embeddings seems complex. Eager to hear more about these strategies, at least what you can spill!

link

riku_iki 954 days ago

> or performs a vector search for longer documents

so, clients upload all their docs to OpenAI database?..

link

karmasimida 954 days ago

Embedding is poor man's context length increase. It essentially increases your context length but with loss.

There is a cost argument to make still, embedding-based approach will be cheaper and faster, but worse result than full text.

That being said, I don't see how those embedding startups compete with OpenAI, no one will be able to offer better embedding than OpenAI itself. It is hardly a convincing business.

The elephant in the room is the open source models aren't able to match up to OpenAI models, and it is qualitative, not quantitive.

link

estreeper 954 days ago

For embeddings specifically, there are multiple open source models that outperform OpenAI’s best model (text-embedding-ada-002) that you can see on the MTEB Leaderboard [1]

> embedding-based approach will be cheaper and faster, but worse result than full text

I’m not sure results would be worse, I think it depends on the extent to which the models are able to ignore irrelevant context, which is a problem [2]. Using retrieval can come closer to providing only relevant context.

1. https://huggingface.co/spaces/mteb/leaderboard

2. https://arxiv.org/abs/2302.00093

link

karmasimida 954 days ago

> on the MTEB Leaderboard

The point isn't about leaderboard. With increasing context length, the question is on whether we need embeddings or not. With longer context length, embeddings is no longer a necessity, and it lowers its value.

link

civilitty 953 days ago

For more trivial use cases, sure, but not for harder stuff like working with US law and precedent.

The US Code is on the order of tens of millions of tokens and I shudder to think how many billions of tokens make up all the judicial opinions that set or interpreted precedent.

link

lazzlazzlazz 954 days ago

OP is incorrect. Embeddings are still needed since (1) context windows can't contain all data and (2) data memorization and continuous retraining is not yet viable.

link

zwily 954 days ago

But the common use case of using a vector DB to pull in augmentation appears to now be handled by the Assistants API. I haven't dug into the details yet but it appears you can upload files and the contents will be used (likely with some sort of vector searching happening behind the scenes).

link

nextworddev 954 days ago

"yet"

link

coding123 954 days ago

It's also much slower. LLMs are generating text token at a time. That's not very good for search.

Pre-search tokenization however, probably a good fit for LLMs.

link

emadabdulrahim 954 days ago

I believe their API can be stateful now: https://openai.com/blog/new-models-and-developer-products-an...

link