| HN Mirror

We're assuming per earlier hypothetical that it has a 10% correctness rate. You had said

>In a regulated industry 90% false positive rate is indistinguishable from 100% failure rate

So defending that position on the basis of it not actually being a 90% failure rate would mean you shouldn't have taken it in the first place. The fact that the LLM lies about its failure rate is nearly irrelevant; the machine could output the literal string "The following is 90% likely to be a false positive: " followed by the LLM output.

The reasoning in the comment that started the thread is "it's annoying to redo human review". Your position as I understand it is that there is no or negative business value to a tool that spit out a list of potential issues of which 10% are real issues with your business. This is what I fail to understand. This would be an incredibly useful first step towards any audit and would save loads of money. Why not?