Hacker News new | ask | show | jobs
by nojs 310 days ago
How does this ensure models haven’t seen it during training - is it a different benchmark per model release?