Hacker News new | ask | show | jobs
by _Algernon_ 297 days ago
Rolling weighted dice repeatedly to generate words isn't factually accurate. More at 11.
1 comments

It is if the weights are sufficiently advanced.
I find such statements frightening. Too many people can not tell the different between prevalence ("everybody does it") and factually correct.
Nothing to do with dice though.
The whole "stochastic means to find factual correctness" thing is an error of method, arguing about weights here is nonsense.
It isn't though, the most factually correct human expert is also stochastic. The only question is how the dice are weighted.
"human expert" as reference for "factually correct", oh just gently caress yourself. Appeal to authority (expert = social status) is as much bullshit as appeal to popularity.
The weights, so to speak, come from the knowledge base. That means you can't get away from the quality of the knowledge base. That isn't uniform across all domains of knowledge. Then the problem becomes how do you make the training material uniformly high-quality in every knowledge domain? At best it becomes the meta problem of determining the quality of knowledge in some way that makes an LLM able to calibrate confidence to a knowledge domain. But more likely we're stuck with the dubious quality that comes from human bias and wishful thinking in supposedly authoritative material.
Sure, it's only as good as the training data. But human experts also output tokens with some statistical distribution. That doesn't mean anything.
That sounds plausible. But it doesn't explain why LLM's make laughably bad errors that even a biased and haphazard human researcher wouldn't make.
Gemini seems to have a user interface that, for the way most people encounter Gemini, is more closely linked to search results. This leads me to suspect that Google's approach to training could be uniquely informed by both current and historic web crawling.
I think that's been a lot less true over the last year or so. Gemini 2.5 Pro is the first LLM I actually find pretty damn reliable.
If you think talking to an LLM is the same experience as talking to a human you should probably talk to more humans
That's not what I said. What I said is that the claim "LLMs aren't intelligent because they stochastically produce characters" doesn't hold because humans do that too even if they're intelligent and authorative.
We don't actually know how human cognition works, so how do you know that humans "stochastically produce characters?"
MCP and agents seem like a solutions but as far as I know maintaining sufficient context is still a problem

I.e. ability to plug in expert data sources

Find tuning and RAG should, in theory, enable applications of LLM's to perform better in specific knowledge, domains, by focusing annotation of knowledge on the domains specific to the application.
I think youre missing the point. The issue is not the amount of knowledge it possesses. The problem is that theres no way to go from "statistically generate the next word" to "what is your confidence level in the fact you just stated". Maybe, with an enormous amount of computation we could layer another AI on top to evaluate or add confidence intervals, but I just dont see how we get there wihthout another quantum leap.
Of course there is. If its training forces it to develop a theory of mind then it will weight the dice so that it's more likely to output "I don't know". Most likely the culprit is that it's hard to make training data for things that it doesn't know.