| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by serjester 478 days ago
	We tried something similar and found much better results with o1 pro than o3 mini. RAG seems to require a level of world knowledge that the mini models don’t have. This comes at the cost of significantly higher latency and cost. But for us, answer quality is a much higher priority.

2 comments

eternityforest 478 days ago

RAG seems to work with 0.5 and 1.5B models just fine a lot of the time, it just can't handle anything that's not directly spelled out in the documents.

Or, at least it seems to in the limited amount of testing I did in a weekend. I'm an embedded dev without any real AI experience or an actual use case for building a RAG at the moment.

link

Foobar8568 478 days ago

RAG is basically a fancy name to augment a prompt with data.

Companies are being sold they can augment their LLM with their unstructured massive dataset but it's all wishful thinking.

link

Workaccount2 478 days ago

Yeah, LLM capabilities are measured with fresh context windows, yet people want to use them with 50k, 100k, 500k tokens.

As you pack in more and more context the model's abilities really start to deteriorate.

The first 10k tokens are the juiciest, after that it just gets worse and worse.

link

eternityforest 478 days ago

Oh wow, I was thinking 500 tokens was way too much, since I've only ever done anything programmatic with tiny models on CPUs....

link

serjester 477 days ago

That's essentially what an embedding model is - a smaller, faster model that's good at finding information quickly. Then you feed that to a larger, more powerful reasoning model to synthesize and you've invented RAG.

link

eternityforest 477 days ago

In my limited weekend testing with just a CPU, the 1.5B model is the larger and more powerful model at the end!

I'm definitely excited to see what new applications are possible with NPUs, when we can run this stuff for real on stuff anyone other than enthusiasts can afford, without waiting 40 seconds.

link

emil_sorensen 478 days ago

Super cool! Yep, a lot seems to get lost through distillation.

link