Hacker News new | ask | show | jobs
by drc500free 258 days ago
That's not 50% success rate at completing the task, that's the win rate of a head-to-head comparison of an algorithm and an expert. 50% means the expert and the algorithm each "win" half the time.
1 comments

For the METR rating (first half of the article), it is indeed 50% success rate at completing the task. The win rate only applies to the GDPval rating (second half of the article).