| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mtokarski 255 days ago

Interesting work, but I think the interpretation may be a bit overstated. The authors claim that injecting too much factual "knowledge" during pretraining causes models to collapse — performance drops below the baseline once knowledge frequency crosses a threshold.

The problem is how they inject it. Their “knowledge” isn’t natural language; it’s templated Wikidata triples like "X is the capital of Y." That’s a super low-entropy, highly repetitive distribution. When you cram enough of that into a fixed token budget, you’re not really teaching the model more facts — you’re just destroying linguistic diversity and skewing the token statistics.

In real pretraining or domain adaptation scenarios, “knowledge” tends to appear in richer, more varied contexts. The practical takeaway isn’t "don’t add too much domain data," but rather "don’t overrepresent any single format or narrow syntactic pattern" The issue seems more about representation homogeneity than about factual density itself.

3 comments

magicalhippo 255 days ago

I'm sure there's other work, I came across this in the Physics of Language Model paper[1] on knowledge extraction.

Essentially they found that by presenting the knowledge in a single, fixed way, the model is trained to reproduce that exact sequence of tokens, rather than "internalizing" the knowledge.

By varying the sentences, the model instead manages to separate out the knowledge, so to speak. This in turn drastically improves how well they can extract that knowledge later.

[1]: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5250633

link

leobg 255 days ago

Of course. Because the unseen part here is that the model is being taught that every other representation of the same fact was wrong.

Meaning, during training, if the model expresses the same fact in some other form, maybe even with just one extra comma, that response will be marked just as wrong as a really wrong one.

In fact, the model may give an answer that’s better than the one in the training set - but it will still be punished for it and forced to change its weights because the answer doesn’t match token-for-token.

We don’t have a loss function for meaning. We only have one for token matching. Anyone who is serious about curating datasets for fine-tuning needs to take this into account.

link

ijk 255 days ago

That's consistent with other research I've seen, where varied presentation of the data is key to effective knowledge injection [1].

My assumption, based on the research is that training on different prompts but the same answer gives you more robust Q&A behavior; training on variations of how to express the same concept generalizes. Training on the same prompt and different answers gives you creative diversity [2].

[1] https://arxiv.org/abs/2404.00213 [2] https://arxiv.org/abs/2503.17126

link

dotancohen 255 days ago

It's the same for humans. This is the main argument against rote memorization.

link

spankalee 255 days ago

Doesn't this then support the claim that LLMs aren't building world models - where even linguistically simple factual statements should help expand and refine that model - and reenforce the idea that they are still just next token predictors?

link

simsla 255 days ago

There's no inductive bias for a world model in multiheaded attention. LLMs are incentivized to learn the most straightforward interpretation/representation of the data you present.

If the data you present is low entropy, it'll memorize. You need to make the task sufficiently complex so that memorisation stops being the easiest solution.

link

andrewflnr 255 days ago

My read is that token prediction requires a more general model to predict more varied tokens, which makes it something closer to a world model. After all, in principle, there's a point where the optimal "token predictor" really is backed by a world model. (Now is that model feasible to find? unclear!)

link

dotancohen 255 days ago

Not unlike humans. Don't believe me? Go ask somebody these questions in quick succession:

  What colour is a tomato?
  What colour is a ruby?
  What colour are lips?
  What colour is a strawberry?
  What colour is blood?
  What colour traffic light do you drive on?

link

bryzaguy 255 days ago

What a cool demonstration. My automatic response was “red” for traffic light. Although, a different part of my brain re-evaluated given the context. The question in my mind now, is the auto response a building block to the latter or is that orchestration a fully separate system?

link

godelski 254 days ago

  > Doesn't this then support the claim that LLMs aren't building world models

There's actually no strong evidence that LLMs, or any AI system, is actually building a world model.

These systems are determined to have "world model" capabilities based on benchmarks, but benchmarks will never be able to tell you if such a feat is taking place. How people are claiming that these have world models is by testing them for consistency. The thing is that a world model is counterfactual. The problems with benchmarks is that they do not distinguish memorization from generalization. To make things worse, the term "Out of Distribution" (OOD) is rather fuzzy and gets abused quite a bit (I can explain more if anyone wants). Basically you should not trust any claim of "few shot" or "zero shot" and the truth is that no such claim can be made without deep knowledge of the datasets they're trained on. It helps to go back to the original zero shot papers.

One bit that might actually help in understanding things is that a world model does not actually need make correct predictions, which should show a critical flaw in benchmarking these capabilities. You can look to the history of physics and gather many great examples of this. For example, the geocentric model still had predictive powers, was counterfactual, and had a lot of accuracy. It was in fact a world model, despite being wrong. There was legitimate pushback to Galileo, specifically over tides[0]. If you like that kind of stuff I highly recommend the podcast "An Opinionated History of Mathematicas"[1].

There's a lot more complexity and nuance to this, but I'll say that there's a reason we do physics the way we do it. Benchmarks and empirical evidence play a critical role in developing physics theories and confirming those theories. But they also are not enough to build our models. (You'll also find that physicists are common dissenters of the claim of LLMs having world models. Sure, you'll also find the Max Tegmark types, but in general the consensus is against them, and for good reason).

Here's a decent paper showing a model being highly accurate yet failing to create an accurate construction of the environment[2]. The way such a thing can happen is to realize that the task diverges from the necessity to model the world. World modeling is a natural thing for humans and animals to do, because it generalizes exceptionally well, but you need to be careful in evaluating things via benchmarks and to remember that extraordinary claims require extraordinary evidence. I'd say claims of "thinking" or "world modeling" are quite extraordinary claims and we should not be hasty to attribute these characteristics when there are many reasonable and simpler alternative explanations.

[0] https://en.wikipedia.org/wiki/Discourse_on_the_Tides

[1] https://intellectualmathematics.com/opinionated-history-of-m...

[2] https://arxiv.org/abs/2406.03689

[disclosure] I have a PhD in Computer Vision and a BS in physics. I care very much about world modeling as a problem but the response I get from many of my peers is "we just care if it works." It's a concern I too share. It is the reason I ask these questions. It feels quite odd that the motivation for my questions is also used to dismiss them. (FWIW, no physicist nor former physicist has ever responded to be this way)

link

agentcoops 255 days ago

Triples are fantastic for information retrieval, but I think if there's any takeaway from the unexpected success of LLMs it's that AI researchers historically undervalued language as such. Early symbolic approaches to AI retrospectively appear torn between reverence towards and hatred of language: on the one hand, a sensible skeptical doubt that language is within reach of software systems; on the other, a belief in the inadequacy of language in the unambiguous representation of knowledge. This paper just seems to confirm that, at least at the training level, the "problem of hallucinations" is not to be resolved by regression back to the various proposals to separate knowledge from its linguistic representation.

Again, this isn't to demonize symbolic AI or to say the answer isn't in the fusion of LLMs with knowledge graphs etc, but I think we now at least know that language is certainly within reach of software and that linguistic representations of knowledge are information-dense in ways we didn't previously anticipate.

link