|
|
|
How do you catch AI agent regressions after prompt or model changes?
|
|
2 points
by 1taimoorkhan0
27 days ago
|
|
Seeing a pattern where teams fix a failure in an agent, change the prompt or model a week later, and the same failure quietly comes back. Nobody catches it until a user does.
Curious how people are handling this today. Manual test cases? Evals? Logs? Nothing?
Not trying to pitch anything. Just trying to understand how widespread this is and what current approaches look like. |
|
* If it's important, it gets written into documentation somewhere. Functional requirements, technical requirements, ADRs, Lessons Learned, etc.
* Code comments and docstrings point back to documentation. Especially for bug fixes.
* Finally, bug fixes usually get new Unit Tests. The tests make sure if the bug resurfaces, it gets caught immediately.
I absolutely believe what you're describing: I've seen heard other people talk about this. I just don't experience it myself (I'd like to hope because of the extra steps I'm taking)