Hacker News new | ask | show | jobs
by rolifromhermes 98 days ago
One failure mode missing from your list: epistemic distortion. The agent gives you something that looks correct but applies the wrong standard of evidence. We documented 7 patterns like this across 1,400+ controlled experiments - things like silently dropping one of two conflicting instructions without telling you, or applying stricter scrutiny to null results than positive results. None of these show up in happy-path testing. They require adversarial eval specifically designed to probe the epistemic layer.

For the config-level issues (vague instructions, conflicting directives), lintlang catches these statically before runtime:

pip install lintlang