Hacker News new | ask | show | jobs
by serjester 477 days ago
That's essentially what an embedding model is - a smaller, faster model that's good at finding information quickly. Then you feed that to a larger, more powerful reasoning model to synthesize and you've invented RAG.
1 comments

In my limited weekend testing with just a CPU, the 1.5B model is the larger and more powerful model at the end!

I'm definitely excited to see what new applications are possible with NPUs, when we can run this stuff for real on stuff anyone other than enthusiasts can afford, without waiting 40 seconds.