| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by kundan_s__r 161 days ago

This is a very pragmatic take. The “90% accuracy is a liability” line resonates — in high-stakes systems, partial automation often costs more than it saves.

What I like here is the field-level confidence gating instead of a single document score. That maps much better to real failure modes, where one bad value (amount, date, vendor) can invalidate the whole record.

One question I’m curious about: how stable are the confidence thresholds over time? In similar systems I’ve seen, models tend to get confidently wrong under distribution shift, which makes static thresholds tricky.

Have you considered combining confidence with explicit intent or scope constraints (e.g., what the system is allowed to infer vs. must escalate), rather than confidence alone?

Overall, this feels much closer to how production systems should treat AI — not as an oracle, but as a component that earns trust incrementally.