| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by judahmeek 140 days ago

How hard is benchmarking models actually?

We've got a lot of available benchmarks & modifying at least some of those benchmarks doesn't seem particularly difficult: https://arc.markbarney.net/re-arc

To reduce cost & maintain credibility, we could have the benchmarks run through a public CI system.

What am I missing here?