| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Leynos 87 days ago
	Use evals Coming soon, unit, behavioural and regression tests for your prompts and skills :P

1 comments

stingraycharles 87 days ago

How do you use evals when you’re using Claude Code, given that Claude Code also changes their prompts all the time?

You’ll have:

* Claude model version

* Claude Code prompts and tools

* Your own prompts and skills and whatnot

* Your repository’s source code (= the input)

All of those change constantly, it’s not like it’s some kind of SWE benchmark.

link

Leynos 86 days ago

You just said it. If consistency is that important, keep consistent versions of model, harness, prompts, skills, etc., and regression test changes. That way lies madness :)

link