| HN Mirror

Yes, but most likely it's marked as false or incorrect through fine tuning or some form of reinforcement.

The idea that the logprobs of any token is proportional to the amount of times it comes up in training data is not true.

For example, suppose that A is a common misconception and is repeated often in Reddit, but B appears in scholarly textbooks and papers, and higher reputation data sources. Then through reinforcement the logprobs of B can increase, and they can increase consistently when surrounded by contexts like "This is true" and conversely decrease in contexts of "this is not true".

So the presumptions and values of its trainers are also embedded into the LLM in addition to those of the authors of the text corpus.