|
|
|
|
|
by Areena_28
72 days ago
|
|
The way we handle it is keeping a small set of fixed test cases that we never change. Like same inputs, same expected outputs. so when we tweak a prompt we run it against those first. if it passes the fixed cases and feels better on the new ones, we keep it. |
|