Hacker News new | ask | show | jobs
by Areena_28 72 days ago
The way we handle it is keeping a small set of fixed test cases that we never change. Like same inputs, same expected outputs. so when we tweak a prompt we run it against those first. if it passes the fixed cases and feels better on the new ones, we keep it.
1 comments

How you get deterministic output though? t=0? Pydantic AI outputs?