Y
Hacker News
new
|
ask
|
show
|
jobs
by
half-kh-hacker
231 days ago
the paper already says "Butter-Bench evaluates a model's ability to 'pass the butter' (Adult Swim, 2014)" so