Hacker News new | ask | show | jobs
by popinman322 854 days ago
vs RAG: RAG is good for searching across >billions of tokens and providing up-to-date information to a static model. Even with huge context lengths it's a good idea to submit high quality inputs to prevent the model from going off on tangents, getting stuck on contradictory information, etc..

vs fine tuning: smaller, fine-tuned models can perform better than huge models in a decent number of tasks. Not strictly fine-tuning, but for throughput limited tasks it'll likely still be better to prune a 70B model down to 2B, keeping only the components you need for accurate inference.

I can see this model being good for taking huge inputs and compressing them down for smaller models to use.