|
|
|
|
|
by thepablohansen
931 days ago
|
|
Fascinating that the human benchmark is 63%- I wonder what the benchmark would look like were it to have been established, say, 30 years ago, before the prevalence of LLMs; I'd wager it would be very close to 100%. Speaks to the moving goalposts. |
|
When models are bad, humans are easy to spot. But if the model's pretty good, it's harder to be sure you're talking to a human.
On top of that, I think a lot of human users didn't want to get got by the model, so they had an a priori bias toward saying AI.