Hacker News new | ask | show | jobs
by judahmeek 140 days ago
How hard is benchmarking models actually?

We've got a lot of available benchmarks & modifying at least some of those benchmarks doesn't seem particularly difficult: https://arc.markbarney.net/re-arc

To reduce cost & maintain credibility, we could have the benchmarks run through a public CI system.

What am I missing here?