| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by caddemon 1119 days ago

Additionally, I'd like to know how they corrected for this: "In a creative twist, many people pretended to be AI bots themselves in order to assess the response of their chat partners"

Assuming it actually was "many people", then whenever they have a human conversational partner (who also would be voting at the end), that person is going to have a hard time and skew the results.

Like imagine playing this game as a lay person after having used ChatGPT a little bit and then getting a response to your question that says "as a large language model ...". Depending on how well the game was explained to participants, it's possible that some people even did this intentionally to fuck with results.

In a proper Turing test there is supposed to be 1 bot and 2 humans, where one human is incentivized only to demonstrate they are human and the other human is the one asking probing questions and needing to guess which is which (but is already known to be human).

Anyway I've only read the linked article and played the game a couple times, I didn't look through the original research publication. It's certainly possible they did address some of these issues, but it is such a buzzword topic at the moment that I have my doubts. And regardless the linked article should cover limitations. For exactly this reason it is important that we have higher expectations for the quality of general audience writing about AI.

1 comments

caddemon 1119 days ago

Ok I have to add one more thing that's funny since I just played a couple more times: if your conversational partner is a human and they exit the window mid-chat, it still lets you vote.

link