Hacker News new | ask | show | jobs
by Zigurd 296 days ago
The weights, so to speak, come from the knowledge base. That means you can't get away from the quality of the knowledge base. That isn't uniform across all domains of knowledge. Then the problem becomes how do you make the training material uniformly high-quality in every knowledge domain? At best it becomes the meta problem of determining the quality of knowledge in some way that makes an LLM able to calibrate confidence to a knowledge domain. But more likely we're stuck with the dubious quality that comes from human bias and wishful thinking in supposedly authoritative material.
3 comments

Sure, it's only as good as the training data. But human experts also output tokens with some statistical distribution. That doesn't mean anything.
That sounds plausible. But it doesn't explain why LLM's make laughably bad errors that even a biased and haphazard human researcher wouldn't make.
Gemini seems to have a user interface that, for the way most people encounter Gemini, is more closely linked to search results. This leads me to suspect that Google's approach to training could be uniquely informed by both current and historic web crawling.
I think that's been a lot less true over the last year or so. Gemini 2.5 Pro is the first LLM I actually find pretty damn reliable.
If you think talking to an LLM is the same experience as talking to a human you should probably talk to more humans
That's not what I said. What I said is that the claim "LLMs aren't intelligent because they stochastically produce characters" doesn't hold because humans do that too even if they're intelligent and authorative.
We don't actually know how human cognition works, so how do you know that humans "stochastically produce characters?"
Do humans always answer exactly the same way to the same question? No.

Also you could always pick the most likely token in an LLM as well to make it deterministic if you really wanted.

That doesn't really prove anything. I could create a Markov chain with a random seed that doesn't always answer the same question the same way, but that doesn't prove the human brain works like a Markov chain with a random seed.

One thing humans tend not to do is confabulate entirely to the degree that LLMs do. When humans do so, it's considered a mental illness. Simply saying the same thing in a different way is not the same as randomly randomly syntactically correct nonsense. Most humans will not, now and then, answer that 2 + 2 = 5, or that the sun rises in the southeast.

MCP and agents seem like a solutions but as far as I know maintaining sufficient context is still a problem

I.e. ability to plug in expert data sources

Find tuning and RAG should, in theory, enable applications of LLM's to perform better in specific knowledge, domains, by focusing annotation of knowledge on the domains specific to the application.
I think youre missing the point. The issue is not the amount of knowledge it possesses. The problem is that theres no way to go from "statistically generate the next word" to "what is your confidence level in the fact you just stated". Maybe, with an enormous amount of computation we could layer another AI on top to evaluate or add confidence intervals, but I just dont see how we get there wihthout another quantum leap.
Of course there is. If its training forces it to develop a theory of mind then it will weight the dice so that it's more likely to output "I don't know". Most likely the culprit is that it's hard to make training data for things that it doesn't know.