|
|
|
|
|
by syllogism
1006 days ago
|
|
The gold standard they're comparing against was done by humans though. And a task-specific model trained on that data will be better at that task than GPT-4. What's definitely true is that getting decent data often takes some care, especially in how you define the task. And mechanical turk is often especially tricky to use well. |
|