Hacker News new | ask | show | jobs
by wasabi991011 18 days ago
It isn't benchmaxxed because they are using human preference as an evaluation.