|
|
|
|
|
by raincole
65 days ago
|
|
No one is claiming an agent can do 50% of arbitrary tasks. It's just 50% of METR's benchmark set. > I think you're overestimating, or oversimplifying Yeah if you only read comments on HN but not the actual linked article you will get oversimplified conclusion. Like, duh? |
|
Curiously, for most submissions it's the opposite - comments are much more useful and nuanced than the source being discussed.