|
|
|
|
|
by lappa
780 days ago
|
|
Provided a constant temperature of 1.0, you can train the model on prompts with probablistic requests, with loss determined by KL divergence. Expectation: 80% left, 20% right Model sampling probability: 99% left, 1% right >>> 0.80 * math.log(0.99 / 0.80) + 0.20 * math.log(0.01 / 0.20) -0.42867188234223175 Model sampling probability: 90% left, 10% right >>> 0.80 * math.log(0.9 / 0.80) + 0.20 * math.log(0.1 / 0.20) -0.04440300758688229 Of course, if you change the temperature this will break any probablistic expectations from training in this manner. |
|