| > how do you know how much is correct Because it's a budget. Verifying them is _much_ cheaper than finding all the entries in a giant PDF in the first place. > the butterfly effect of dependence on an undependable stochastic system We're using stochastic systems for a long time. We know just fine how to deal with them. > Meanwhile an agent that you accept to get only 98% of things right is meeting expectations. There are very few tasks humans complete at a 98% success rate either. If you think "build spreadsheet from PDF" comes anywhere close to that, you've never done that task. We're barely able to recognize objects in their default orientation at a 98% success rate. (And in many cases, deep networks outperform humans at object recognition) The task of engineering has always been to manage error rates and risk, not to achieve perfection. "butterfly effect" is a cheap rhetorical distraction, not a criticism. |
Perhaps importantly checking is a continual process and errors are identified as they are made and corrected whilst in context instead of being identified later by someone completely devoid of any context a task humans are notably bad at.
Lastly it's important to note the difference between a overarching task containing many sub tasks and the sub tasks.
Something which fails at a sub task comprising 10 sub tasks 2% of the time per task has a miserable 18% failure rate at the overarching task. By 20 it's failed at 1 in 3 attempts worse a failing human knows they don't know the answer the failing AI produces not only wrong answers but convincing lies
Failure to distinguish between human failure and AI failure in nature or degree of errors is a failure of analysis.