| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by eternityforest 477 days ago
	RAG seems to work with 0.5 and 1.5B models just fine a lot of the time, it just can't handle anything that's not directly spelled out in the documents. Or, at least it seems to in the limited amount of testing I did in a weekend. I'm an embedded dev without any real AI experience or an actual use case for building a RAG at the moment.

2 comments

Foobar8568 477 days ago

RAG is basically a fancy name to augment a prompt with data.

Companies are being sold they can augment their LLM with their unstructured massive dataset but it's all wishful thinking.

link

Workaccount2 477 days ago

Yeah, LLM capabilities are measured with fresh context windows, yet people want to use them with 50k, 100k, 500k tokens.

As you pack in more and more context the model's abilities really start to deteriorate.

The first 10k tokens are the juiciest, after that it just gets worse and worse.

link

eternityforest 477 days ago

Oh wow, I was thinking 500 tokens was way too much, since I've only ever done anything programmatic with tiny models on CPUs....

link

serjester 477 days ago

That's essentially what an embedding model is - a smaller, faster model that's good at finding information quickly. Then you feed that to a larger, more powerful reasoning model to synthesize and you've invented RAG.

link

eternityforest 476 days ago

In my limited weekend testing with just a CPU, the 1.5B model is the larger and more powerful model at the end!

I'm definitely excited to see what new applications are possible with NPUs, when we can run this stuff for real on stuff anyone other than enthusiasts can afford, without waiting 40 seconds.

link