|
|
|
|
|
by entee
257 days ago
|
|
A lot of this post relies on the recent open ai result they call GDPval (link below). They note some limitations (lack of iteration in the tasks and others) which are key complaints and possibly fundamental limitations of current models. But more interesting is the 50% win rate stat that represents expert human performance in the paper. That seems absurdly low, most employees don’t have a 50% success rate on self contained tasks that take ~1 day of work. That means at least one of a few things could be true: 1. The tasks aren’t defined in a way that makes real world sense 2. The tasks require iteration, which wasn’t tested, for real world success (as many tasks do) I think while interesting and a very worthy research avenue, this paper is only the first in a still early area of understanding how AI will affect with the real world, and it’s hard to project well from this one paper. https://cdn.openai.com/pdf/d5eb7428-c4e9-4a33-bd86-86dd4bcf1... |
|