Hacker News new | ask | show | jobs
by trq_ 146 days ago
Yes, we do but harnesses are hard to eval, people use them across a huge variety of tasks and sometimes different behaviors tradeoff against each other. We have added some evals to catch this one in particular.
2 comments

Can't you keep the model the same, until the user chooses to use a different model?
He said it was the harness, not the model though.
Thank you. Fair enough