Y
Hacker News
new
|
ask
|
show
|
jobs
by
leogao
1831 days ago
Thankfully, there already exist evaluation tasks like that, and Eleuther actually has a project collecting a handful of them together; see
https://github.com/EleutherAI/lm-evaluation-harness/