Hacker News new | ask | show | jobs
by oblangatas 62 days ago
AI agents don’t crash, they just quietly give wrong answers is painfully accurate.. quality RCA is a good direction

curious about how well do the generated hypotheses generalize beyond obvious prompt issues? IMHO lots of failures come from interactions between retrieval, tool selection, and state, more than a single not-so-good description. Do you find the clustering surfaces such multi-factor issues? does it tend to collapse them into prompt-level fixes?