Hacker News new | ask | show | jobs
by leogao 1831 days ago
Thankfully, there already exist evaluation tasks like that, and Eleuther actually has a project collecting a handful of them together; see https://github.com/EleutherAI/lm-evaluation-harness/