Hacker News new | ask | show | jobs
by magicalhippo 255 days ago
I'm sure there's other work, I came across this in the Physics of Language Model paper[1] on knowledge extraction.

Essentially they found that by presenting the knowledge in a single, fixed way, the model is trained to reproduce that exact sequence of tokens, rather than "internalizing" the knowledge.

By varying the sentences, the model instead manages to separate out the knowledge, so to speak. This in turn drastically improves how well they can extract that knowledge later.

[1]: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5250633

3 comments

Of course. Because the unseen part here is that the model is being taught that every other representation of the same fact was wrong.

Meaning, during training, if the model expresses the same fact in some other form, maybe even with just one extra comma, that response will be marked just as wrong as a really wrong one.

In fact, the model may give an answer that’s better than the one in the training set - but it will still be punished for it and forced to change its weights because the answer doesn’t match token-for-token.

We don’t have a loss function for meaning. We only have one for token matching. Anyone who is serious about curating datasets for fine-tuning needs to take this into account.

That's consistent with other research I've seen, where varied presentation of the data is key to effective knowledge injection [1].

My assumption, based on the research is that training on different prompts but the same answer gives you more robust Q&A behavior; training on variations of how to express the same concept generalizes. Training on the same prompt and different answers gives you creative diversity [2].

[1] https://arxiv.org/abs/2404.00213 [2] https://arxiv.org/abs/2503.17126

It's the same for humans. This is the main argument against rote memorization.