| I think I maybe get where you are coming from, but still how? I feel we are discussing 2 different use cases. 1) Prompt 1:
“ You are a weighted random choice generator. About 80% of the time please say ‘left’ and about 20% of the time say ‘right’. Simply reply with left or right. Do not say anything else" ” 2) Assume that the training data gives examples of
2.1) single coin flips
2.2) multiple coin flips Consider a slightly different prompt, prompt 2: 3) Prompt 2: same as prompt 1, except it presents 1000 lefts/rights in the same response (l,l,l,l,r,l,l,l…) —— I think what you are describing is prompt 2. I just did a quick test with GPT 4, and i got a 27-3, split when using prompt 2. However for prompt 1 - you get only left. To me this makes sense because Running prompt 1 x100 should result in: Pass 1: LLM receives prompt, and parses it. LLM predicts the next token. The next token should be left.
Pass 2: same as pass 1. —— For prompt 1, Every prompt submission is a tabula rasa. So it will correctly say left, which is the correct answer for the active universe of valid prompt responses according to the model. Unless i am reading you wrong and you are saying the model is actually acting as a weighted coin flip. In theory, the LLM should be more responsive if you ask it follow a 60:40 or 50:50 split for pass 1. Ill see if I can test this later. (Heck now I’m more concerned about the cases where it does manage to apply the distribution. ) |
So it’s just neat that the weights in the coin flip don’t match what is asked for.