|
|
|
|
|
by lgessler
603 days ago
|
|
I'm afraid I feel that this piece confuses more than it clarifies. First, saying a model "scours through its vast training data" is misleading at best: at inference time, LLMs no longer have direct access to their training data; they only have access to it insofar as it's been encoded in its parameters. Second, saying "Every instance of an answer in the training data is like a vote" doesn't give the full picture. First of all, there are embedded contexts where "votes" can be negated: consider saying "the Earth is flat." vs. "We know it's false that the Earth is flat." or "Only a fool thinks the Earth is flat." All three contain the substring "the Earth is flat.", but both humans and LLMs are able to use context to understand that the latter two sentences are doing the opposite of endorsing the proposition that the Earth is flat. You could even imagine an extended satirical bit with "the Earth is flat" embedded within it where it is clear to a reader that all its content is intended to be taken as farcical, and I'd wager that an LLM would in many cases recognize this. So the voting metaphor breaks down here--it makes you think that the LLM is just keeping a tally of propositions, but really, it is doing something a bit more sophisticated. I don't disagree with the premise, of course. LLM overhype is real. But we should be skeptical for the right reasons. Anna Rogers and Sasha Luccioni have a paper I really like: https://openreview.net/pdf?id=M2cwkGleRL |
|