|
|
|
|
|
by tripletao
760 days ago
|
|
If you're referring to the study at https://news.ycombinator.com/item?id=40386571 , then it wasn't a canonical Turing test. The preprint accurately describes and analyzes their (indefensibly bad) experiment, but the popular press has mischaracterized it. The canonical test gives the interrogator two witnesses, one human and one machine, and asks them to judge which witness is human. The interrogator knows that exactly one witness is human. In that test, a 50% chance of a right answer means the machine is indistinguishable from human. (Turing actually proposed a lower pass threshold, perhaps for statistical convenience.) But that study gave the interrogator one witness, and asked them to judge whether it was human. The interrogator wasn't told anything about the prior probability that their witness was human. The probabilities that a real human is judged human and that GPT-4 is judged human sum to >100%, since nothing stops that since it's not a binary comparison. So 50% has no particular meaning. The result is effectively impossible to interpret, since it's a function both of the witness's performance and of whatever assumption the interrogator makes about the unspecified prior. |
|