Hacker News new | ask | show | jobs
by half-kh-hacker 231 days ago
the paper already says "Butter-Bench evaluates a model's ability to 'pass the butter' (Adult Swim, 2014)" so