| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dinobones 696 days ago

Lol more like, Perplexity has a terminal diagnosis.

Doing RAG using like.. prompt hacking and text embeddings + vector store when you have no access to the underlying model, nor ability to fine tune the generation for RAG, will fail. It will fail in an epic fashion compared to doing RAG the right way.

What do I mean by RAG the right way? The RAG term has been overloaded.

There's RAG that's just kind of bolted onto the LLM after it's been fine tuned for instruction following, and then there's RAG where the document/fact retrieval is a part of the LLM itself that is differentiated and optimized.

Almost everyone is doing the first "hacky" kind of RAG, but Meta published in 2020, the "correct" way to do RAG, where you include a neural retriever in the feedback loop.

Almost no one is doing this because it's more expensive (requires fine tuning the model), but will produce much better results than doing "bolted-on" RAG.

https://ai.meta.com/blog/retrieval-augmented-generation-stre...

Perplexity does not have access to any GPT model weights. It's unlikely they'll be able to compete on quality.

It's game over.

3 comments

inertiatic 696 days ago

Didn't we have multiple open-weight frontier model announcements?

link

viraptor 696 days ago

I'm not sure why the assumption of "will produce much better results". The fine tuning is not that predictable. Maybe some documents are remembered, maybe not. Maybe the document markers are preserved, maybe they fail. And adding anything new risks destroying existing data and is expensive.

Compare that to the vector + graph search, which is almost free to add to (if you're searching the internet, you're adding N documents per minute, not per days of training), repeatable, not affecting existing data. It would be cool to have a neural search, but how realistic is it without making it extremely fuzzy, forgetting and expensive?

link

ldjkfkdsjnv 696 days ago

Actually I disagree. I think the barrier to entry on a decent RAG system is very low. Embeddings have gotten so good, that retrieving chunks is going to be commoditized. Neural search was needed when embedding models were not good enough.

link