| TL;DR After nine months of chasing weird hallucinations and silent failures in production LLM / RAG systems, we catalogued every failure pattern we could reproduce. The result is an MIT-licensed “Semantic Clinic” with 16 root-cause families and step-by-step fixes. --- ## Why we built it @ Most bug reports just say “the model lied,” but the cause is almost always deeper: retrieval drift, OCR mangling, prompt contamination, etc. @ Existing docs mix symptoms and remedies in random blogposts; we wanted one map that shows where the pipeline breaks and why. @ After fixing the same issues across 11 real stacks we decided to standardise the notes and open-source them. --- ## What’s inside @ 16 root-cause pages (Hallucination & Chunk Drift, Interpretation Collapse, Entropy Melts, etc.). @ Quick triage index: find the symptom → jump to the fix page. @ Each page gives: real-world symptoms, metrics to watch (ΔS semantic tension, λ_observe logic flow), a reproducible notebook, and a “band-aid-to-surgery” list of fixes. @ Tiny CLI tools: semantic diff viewer, prompt isolator, vector compression checker. All plain bash + markdown so anyone can fork. --- ## Does it help? @ On our own stacks the average debug session dropped from hours to ~15 min once we tagged the family. @ The first 4 root causes explain ~80 % of the bugs we see in the wild. @ Used so far on finance chatbots, doc-QA, multi-agent sims; happy to share war stories. ## Call for help @ If you’ve hit a failure that isn’t on the list, open an issue or PR. We especially want examples of symbolic prompt contamination or large-scale entropy collapse.
@ Long-term goal: turn the clinic into a self-serve triage bot that annotates stack traces automatically. --- ## Why open-source? Debug knowledge shouldn’t be pay-walled. The faster we share failure modes, the faster the whole field moves (and the fewer 3 a.m. rollbacks we all do). Cheers – PSBigBig / WFGY team |