Y
Hacker News
new
|
ask
|
show
|
jobs
by
djohnston
738 days ago
We are using benchmarking on our own eval sets, which makes it easier to measure the variance that I’ve found impossible to eliminate.