|
|
|
|
|
by shenberg
1176 days ago
|
|
They showed that, at least for their tasks, their definition of the task was well-defined-enough for ChatGPT. That's exactly why the comparison is useful. MTurk is often used in these tasks in place of more expensive human annotators (e.g. grad students), and this paper says that for their case, at least, ChatGPT worked better, in the sense that, given the exact same instructions, it gave answers closer to the more-expensive annotators. Using MTurk seriously often entails extra steps intended to verify motivation, e.g. adding a question that says "select option 7" to make sure the person isn't just making random choices, or gathering more answers for questions where there was disagreement between annotators. What these extra steps have in common is that they take both more time when designing the labeling process, and cost more money. |
|