Hacker News new | ask | show | jobs
by 2001zhaozhao 22 days ago
At some point we have to be running into some inherent mathematical limits of knowledge compression, right? No way the knowledge benchmarks on these 8B models will keep getting better without overfitting on these benchmarks
2 comments

If you give the model access to specialized tools (e.g. web search for question answering) the knowledge doesn't have to be stored in the model weights, which leaves some room for improvement. You'd still be overfitting to benchmarks (since different tasks might require different tools) but not necessarily to specific benchmark questions, so within-domain generalization could be quite good.

As an example for a similar approach, Teapot AI has trained very small models https://teapotai.com/models to only answer questions where the answer can be found within the context window, and although not perfect, they do quite well at this compared to larger, more general models.

good point I have the feeling larger models (20b+) rely too much about their stored knowledge and sometimes fail to use tools because they think they know the answer. smaller specialized tool calling models could be the smart route for the future
Yea, it's strange all that all possible books stealing movement and then lobbying for law prohibiting... something.

Humans train "thinking methodology" first and then know how to use it while accessing data and to build knowledge.

Humans do not memorize at once all text in existence, that's totally stupid.

Already thinking humans specialize in disciplines: math, chemistry, IT, cooking, etc while still using new data.

All of that computing is local- on the LAN of the brain.

So if some "agents" wants to help then there is zero need for computation outside of home/corporation/car local area network.

Licenses ??