Hacker News new | ask | show | jobs
by djohnston 738 days ago
We are using benchmarking on our own eval sets, which makes it easier to measure the variance that I’ve found impossible to eliminate.