| I went through all the comments here and I'm still not seeing anyone address this: If I am reading this person correctly... they prompted the model with the prompt directly 1000 times... but only for the first time. They did not allow the model to actually run a context for chat. Simply, output the first in a list of 'left' and 'right' and favor 'left' 80% of the time... but then the author only asked for the first output. This person doesn't understand how LLMs and their output sampling works. Or they do and they still just decided to go with their method here because of course it works this way. The model takes the prompt. The first following output token it chooses, for this specfic model, happens to be 'Left'. They shut down the prompt and prompt again. Of course the next output will be 'Left'. They aren't letting it run in context and continue. The temperature of the model is low enough that the sampler is going to always pick the 'Left' token, or at least 999/1000 in this case. It cannot start to do an 80/20 split of Left/Right if you never give it a chance to start counting in context. Continuously stopping the prompt and re-promting will, of course, just run the same thing again. I can't tell if the author understands this and is pontificating on purpose or if the author doesn't understand this and is trying to make some profound statement on what LLMs can't do when... anyone who knows how the model inference runs could have told you this. |