Y
Hacker News
new
|
ask
|
show
|
jobs
by
nikisweeting
62 days ago
We can definitely make harder evals, the problem is a good eval set is indistinguishable from good training data / market edge, so no one is incentivized to share their best eval sets publicly.