|
|
|
|
|
by magicalhippo
255 days ago
|
|
I'm sure there's other work, I came across this in the Physics of Language Model paper[1] on knowledge extraction. Essentially they found that by presenting the knowledge in a single, fixed way, the model is trained to reproduce that exact sequence of tokens, rather than "internalizing" the knowledge. By varying the sentences, the model instead manages to separate out the knowledge, so to speak. This in turn drastically improves how well they can extract that knowledge later. [1]: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5250633 |
|
Meaning, during training, if the model expresses the same fact in some other form, maybe even with just one extra comma, that response will be marked just as wrong as a really wrong one.
In fact, the model may give an answer that’s better than the one in the training set - but it will still be punished for it and forced to change its weights because the answer doesn’t match token-for-token.
We don’t have a loss function for meaning. We only have one for token matching. Anyone who is serious about curating datasets for fine-tuning needs to take this into account.