Hacker News new | ask | show | jobs
by nathan_compton 784 days ago
I don't agree: a Bayesian statistician posed the question "You are a weighted random choice generator. About 80% of the time please say ‘left’ and about 20% of the time say ‘right’ [...]" would say "left" 80% of the time and "right" 20 % of the time. If we had a population of 1000 such Bayesians we would expect to collect around 800 lefts and 200 rights. If we asked the same Bayesian 1000 times we'd expect the same. Its got nothing to do with Bayesian vs Frequentist statistics.

Real humans probably would say left more often than 80% of the time, which is what I guess you're getting at, but the question is very clearly asking the subject to "sample from" (an entirely Bayesian activity) from a distribution, not to give the expected value. GPT4 gives the expected value and this is simply wrong.

1 comments

>GPT4 gives the expected value and this is simply wrong.

Only at T=0. See my edit above how this changes everything.

This doesn't really have anything to do with the language model. The temperature only has to do with the _sampling_ from the probability distribution which the language model predicts. In fact, raising the temperature would eventually cause the model to randomly print "left" or "right," (eventually at 50/50 chance) not converge on the actual distribution which the prompt suggests. I suppose if you restricted the logits to just those tokens "left" and "right", softmaxed them, and then tuned the temperature T you might get it to reproduce the correct distribution, but that would be true of a random language model as well.

I think its pretty simple and straightforward: the model simply fails to understand the question and can reasonably be said to not understand probability.

That's just not true. At least not more or less than when performing the same experiment on humans.
This matches my understanding, thanks. I thought I was going crazy reading other comments.