| HN Mirror

Provided a constant temperature of 1.0, you can train the model on prompts with probablistic requests, with loss determined by KL divergence.

Expectation: 80% left, 20% right

Model sampling probability: 99% left, 1% right

>>> 0.80 * math.log(0.99 / 0.80) + 0.20 * math.log(0.01 / 0.20)

-0.42867188234223175

Model sampling probability: 90% left, 10% right

>>> 0.80 * math.log(0.9 / 0.80) + 0.20 * math.log(0.1 / 0.20)

-0.04440300758688229

Of course, if you change the temperature this will break any probablistic expectations from training in this manner.