|
|
|
|
|
by bluefirebrand
440 days ago
|
|
> So I personally don't think it shows LLM models can fool humans trying to unmask them Maybe these used special LLMs that are unrestricted or something but isn't it pretty trivial to get an LLM to output error prompts by asking them to commit crimes or talk about certain topics? I think priming people to think they might be talking to a human skews the results here because people will be more hesitant to say really wild shit that the LLM can't react appropriately to, if they think they might be talking to a human |
|
Perhaps the final form of this experiment will always consider the reward value (for results better than chance, since zero effort for $0.5*X is better than full effort than $X), and we could track the increase in the necessary reward to distinguish over time. There might be a casino game in there somewhere, though collusion between human witnesses and interrogators might become a problem as the stakes get high.