| HN Mirror

No, because you're confusing loss functions: a LLM makes a probabilistic prediction, not a hard decision. That is the optimal strategy only if you have something like a 0-1 loss function†, akin to betting on a coin flip, which is not a proper scoring rule (and not easily differentiable either).

Whereas LLMs are usually trained with a proper scoring rule which incentivizes them to report calibrated predictions, like mean squared error. For that, the optimal prediction is just '50%', perhaps transformed into log-odds, and whatever the equivalent of '50%' is over the BPE vocabulary.

† eg if you are betting $1 on whether heads or tails come up, it is true that you can't do better than always betting $1 on the side with P>50% - and strikingly, this is not what people do in setups like the spinner game (or Twitter polls), they 'probability match', which is optimal in terms of Thompson sampling, as if they were playing a indefinitely-long repeated bandit to minimize regret. I usually take this as an example of System I vs System II: showing how hard it is to break our real-world-appropriate intuitive behavior in artificial game setups. If you think about it, in the usual spinner-game, probability matching is just straightforwardly wrong and it's not like a bandit at all; but you do have to think about it.