Hacker News new | ask | show | jobs
by eternityforest 477 days ago
RAG seems to work with 0.5 and 1.5B models just fine a lot of the time, it just can't handle anything that's not directly spelled out in the documents.

Or, at least it seems to in the limited amount of testing I did in a weekend. I'm an embedded dev without any real AI experience or an actual use case for building a RAG at the moment.

2 comments

RAG is basically a fancy name to augment a prompt with data.

Companies are being sold they can augment their LLM with their unstructured massive dataset but it's all wishful thinking.

Yeah, LLM capabilities are measured with fresh context windows, yet people want to use them with 50k, 100k, 500k tokens.

As you pack in more and more context the model's abilities really start to deteriorate.

The first 10k tokens are the juiciest, after that it just gets worse and worse.

Oh wow, I was thinking 500 tokens was way too much, since I've only ever done anything programmatic with tiny models on CPUs....
That's essentially what an embedding model is - a smaller, faster model that's good at finding information quickly. Then you feed that to a larger, more powerful reasoning model to synthesize and you've invented RAG.
In my limited weekend testing with just a CPU, the 1.5B model is the larger and more powerful model at the end!

I'm definitely excited to see what new applications are possible with NPUs, when we can run this stuff for real on stuff anyone other than enthusiasts can afford, without waiting 40 seconds.