Hacker News new | ask | show | jobs
by veselin 815 days ago
I think this is simply the default of lm-evaluation-harness. They said they ran every single benchmark they could out of the box.