| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Areena_28 118 days ago
	The way we handle it is keeping a small set of fixed test cases that we never change. Like same inputs, same expected outputs. so when we tweak a prompt we run it against those first. if it passes the fixed cases and feels better on the new ones, we keep it.

1 comments

How you get deterministic output though? t=0? Pydantic AI outputs?