|
|
|
|
|
by judahmeek
140 days ago
|
|
How hard is benchmarking models actually? We've got a lot of available benchmarks & modifying at least some of those benchmarks doesn't seem particularly difficult: https://arc.markbarney.net/re-arc To reduce cost & maintain credibility, we could have the benchmarks run through a public CI system. What am I missing here? |
|