Hacker News new | ask | show | jobs
by kelseyfrog 784 days ago
At some point the logits at a branching point in the response need to correspond to the respective probabilities of the requested output classes so that they can be appropriately sampled and strongly condition the remainder of the response. My instinct says this cannot be accomplished irrespective of temperature, but I could be persuaded. with math.
2 comments

Provided a constant temperature of 1.0, you can train the model on prompts with probablistic requests, with loss determined by KL divergence.

Expectation: 80% left, 20% right

Model sampling probability: 99% left, 1% right

>>> 0.80 * math.log(0.99 / 0.80) + 0.20 * math.log(0.01 / 0.20)

-0.42867188234223175

Model sampling probability: 90% left, 10% right

>>> 0.80 * math.log(0.9 / 0.80) + 0.20 * math.log(0.1 / 0.20)

-0.04440300758688229

Of course, if you change the temperature this will break any probablistic expectations from training in this manner.

Or you can just add some randomness to the prompt by adding “Your random seed is mciifjrbdifnf.”

I just tested that and got 4 left and 2 right so it works pretty well.