Y
Hacker News
new
|
ask
|
show
|
jobs
by
woeirua
36 days ago
METR themselves say that any estimate >16 is highly suspect because there are too few tasks.
I expect benchmarks like ProgramBench will replace METR this year.