Hacker News new | ask | show | jobs
by stingraycharles 88 days ago
How do you use evals when you’re using Claude Code, given that Claude Code also changes their prompts all the time?

You’ll have:

* Claude model version

* Claude Code prompts and tools

* Your own prompts and skills and whatnot

* Your repository’s source code (= the input)

All of those change constantly, it’s not like it’s some kind of SWE benchmark.

1 comments

You just said it. If consistency is that important, keep consistent versions of model, harness, prompts, skills, etc., and regression test changes. That way lies madness :)