Hacker News new | ask | show | jobs
by woeirua 36 days ago
METR themselves say that any estimate >16 is highly suspect because there are too few tasks.

I expect benchmarks like ProgramBench will replace METR this year.