Y
Hacker News
new
|
ask
|
show
|
jobs
by
trq_
146 days ago
Yes, we do but harnesses are hard to eval, people use them across a huge variety of tasks and sometimes different behaviors tradeoff against each other. We have added some evals to catch this one in particular.
2 comments
amelius
146 days ago
Can't you keep the model the same, until the user chooses to use a different model?
link
rovr138
146 days ago
He said it was the harness, not the model though.
link
hu3
146 days ago
Thank you. Fair enough
link