|
|
|
|
|
by JoelEinbinder
677 days ago
|
|
On the full set of 1000 questions, the language models are getting 30-35% correct. With patience, humans can do 40-50%. The language models were prompted with the text + each candidate answer, and the one with the lowest perplexity was picked. I tried to avoid instruction tuned models wherever possible to avoid the "voice" problem. |
|
the task of "predicting the next word" can be understood as either "correctly choosing the next word in the hidden context", or "predicting the likelihood of each possible word".
the quiz is evaluating against the former, but humans are still far from being able to express a percentile likelihood for each possibility.
i only consciously arrive at a vague feeling of confidence, rather than being able to weigh the prediction of each word with fractional precision.
one might say that LLMs have above human introspective ability in that regard.