Hacker News new | ask | show | jobs
by woeirua 33 days ago
Ok, but you can just look at the METR curve. Mythos saturated the 50% time horizon. The 80% is now at 3 hours. The rate of progress is accelerating not slowing down. There’s no indication yet that this is a sigmoid!
1 comments

The METR task set contains no tasks with a duration greater than 32 hours (conservatively eyeballed from Figure 3: https://arxiv.org/abs/2503.17354 ), so any prediction that naively forecasts a longer time horizon is trivially incorrect. I guess that won't lead to a sigmoid-looking graph though, since METR will likely switch to a different evaluation methodology at that point and stop updating the old curve.
METR themselves say that any estimate >16 is highly suspect because there are too few tasks.

I expect benchmarks like ProgramBench will replace METR this year.