Hacker News new | ask | show | jobs
by snemvalts 29 days ago
What about other benchmarks? Benchmarks where the contents are freely available have become useless for evaluating models.