Hacker News new | ask | show | jobs
by sitkack 509 days ago
How are you doing your evals?

Being able to do semantic diffs of the output of the two models should tell you what you need to do.