Hacker News new | ask | show | jobs
by bitexploder 8 days ago
Don't LLMs work on attention though? The closer in their hyperdimensional space you can land your problem to their inherent understand the better they are at understanding your problem domain. RAG loops can be very slow and agents may simply lack the knowledge to use them correctly.
1 comments

But, in short, the ability to manage information, to process it properly, is more important in this regard than just having the information. "Having" more knowledge is not a guarantee to "using" it better.

And to improve reliability, if the machine can check, it will have to check. "Costly" cannot be an excuse.

Understanding of a specific problem space can be a prerequisite to be able to form a proper query (i.e. to ask the correct question).

Model doesn't know what it doesn't know.

Your suggestion is not clear: yes we reason and define relevant details (maybe through further information retrieval) to better construct queries - that is what Analytical school of thought taught and insisted on -, and even more crucial is that the subsequent delegated steps, of constructing replies, imply reasoning and information retrieval.

Said abilities - intellectual strength - are immensely more important than notions. The relation between network size and intellectual strength, vs network size and notions (original topic in this branch), is presumably not yet that clear. Intelligent models may not necessarily be embedded with explicit information of everything, though they will have to have ways to reach that upon contingent necessity (to solve specific problems). Like us.

I agree with what you said. I just wanted to add that intelligent models probably need to have some notion embedded (but not everything), as some information retrieval is not trivial. Too few embedded notions will hurt it's ability to solve problems but from some point onward you'll get diminishing returns (where it starts to make sense to rely just on information retrieval).

For example, you if you instruct a model to create decoder for some data type users will upload to your website. The intelligent model without notions will retrieve information about that data type and build a working decoder, but it might miss from context that users uploading to a website means untrusted input and thus won't even try to gather information about what it needs to be done to securely handle such uploaded data.

Or if you give it a task to translate text to a language it didn't encounter during training. You can provide it with grammar rules and a dictionary for information retrieval, but I guess it won't perform as well as inteligent model that already has some fundamental notions of that language and only needs a dictionary to expand its vocabulary.

Gpt-4.1 only knows a lot of patterns, but doesn't have reasoning intelligence that would help it properly use that knowledge. So, a small reasoning model can easily beat it in a lot of tasks. The question is how will, 14 months from now, new small reasoning models compare to current big reasoning models.

How much information needs to be embedded is not yet clear, but currently, bigger reasoning models are still better at complex tasks than small reasoning models. Either sweet spot of embedded notions is higher that what current small models have or information retrieval ability needs to improve.