|
|
|
|
|
by robertclaus
784 days ago
|
|
I wonder if you could actually fine tune an LLM to do better on this. As some of the comments point out, the issue here is that the possible output probabilities combined with the model temperature don't actually result in the probabilities requested in the prompt. If you trained on specific generated data with real distributions would it learn to compensate appropriately? Would that carry over to novel probability prompts? |
|
If the temperature was not zero, then it seems technically possible for the output tokens to weighted closely enough in probability to each other in a way such that the randomization from temperature causes tokens to be printed in the appropriate distribution.
However, I'm not an LLM expert, but I don't think that people use a "temperature" while training the model. Thus the training step would not be able to learn how to output tokens in the given distribution with a given temperature because the training step does not have access to the temperature the user is using.
EDIT: I made the assumption that the LLM was not asked for a sequence of random numbers, but only one number per prompt. I think this fits the use case described in the article, but another use case might be asking for a sequence of such numbers, in which case training might work.