|
|
|
|
|
by notahacker
900 days ago
|
|
"GPT-4 achieved a success rate of 41 percent, second only to actual humans" also feels like a (much bigger) lie of omission looking at the original paper. GPT4's performance was in the range of 6% to 41%, Eliza's 27% score sat in the upper middle of that range, and considering the bots tested consisted of 8 GPT4 prompts, 2 GPT3.5 prompts and a naive script from the 1960s, GPT4 would have had to be remarkably consistently inhuman not to finish "second only to humans" with its highest scoring prompt The blog appears to have been updated to specify GPT3.5, but the original version was accurate. The paper itself is interesting as it covers the limitations (it has big methodological issues), how the GPT prompts attempted to overridei default chatGPT tone and reasons why ELIZA performed surprisingly well (some thought it was so uncooperative, it must be human!)
https://arxiv.org/pdf/2310.20216.pdf |
|