| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by TXTOS 356 days ago

TL;DR

After nine months of chasing weird hallucinations and silent failures in production LLM / RAG systems, we catalogued every failure pattern we could reproduce. The result is an MIT-licensed “Semantic Clinic” with 16 root-cause families and step-by-step fixes.

---

## Why we built it

@ Most bug reports just say “the model lied,” but the cause is almost always deeper: retrieval drift, OCR mangling, prompt contamination, etc.

@ Existing docs mix symptoms and remedies in random blogposts; we wanted one map that shows where the pipeline breaks and why.

@ After fixing the same issues across 11 real stacks we decided to standardise the notes and open-source them.

---

## What’s inside

@ 16 root-cause pages (Hallucination & Chunk Drift, Interpretation Collapse, Entropy Melts, etc.).

@ Quick triage index: find the symptom → jump to the fix page.

@ Each page gives: real-world symptoms, metrics to watch (ΔS semantic tension, λ_observe logic flow), a reproducible notebook, and a “band-aid-to-surgery” list of fixes.

@ Tiny CLI tools: semantic diff viewer, prompt isolator, vector compression checker. All plain bash + markdown so anyone can fork.

---

## Does it help?

@ On our own stacks the average debug session dropped from hours to ~15 min once we tagged the family.

@ The first 4 root causes explain ~80 % of the bugs we see in the wild.

@ Used so far on finance chatbots, doc-QA, multi-agent sims; happy to share war stories.

## Call for help

@ If you’ve hit a failure that isn’t on the list, open an issue or PR. We especially want examples of symbolic prompt contamination or large-scale entropy collapse. @ Long-term goal: turn the clinic into a self-serve triage bot that annotates stack traces automatically.

---

## Why open-source?

Debug knowledge shouldn’t be pay-walled. The faster we share failure modes, the faster the whole field moves (and the fewer 3 a.m. rollbacks we all do).

Cheers – PSBigBig / WFGY team