|
|
|
|
|
by troupo
5 days ago
|
|
> You can't distinguish between a machine that says "here, look at these 170 results, 10% of them are highly serious problems that you should address The machine doesn't say that. It says "Here are 170 completely correct and verified results". You have to check and verify all of those results yourself, and on any given day it can be anywhere from 0% to 100% incorrect. > I assume you've come to this conclusion based on some reasoning, but you're not sharing it in this response AFAICT. The reasoning comes from actually working with AI tools. And the reasoning can be seen in the actual comment this tgread started from: https://news.ycombinator.com/item?id=48434824 |
|
>In a regulated industry 90% false positive rate is indistinguishable from 100% failure rate
So defending that position on the basis of it not actually being a 90% failure rate would mean you shouldn't have taken it in the first place. The fact that the LLM lies about its failure rate is nearly irrelevant; the machine could output the literal string "The following is 90% likely to be a false positive: " followed by the LLM output.
The reasoning in the comment that started the thread is "it's annoying to redo human review". Your position as I understand it is that there is no or negative business value to a tool that spit out a list of potential issues of which 10% are real issues with your business. This is what I fail to understand. This would be an incredibly useful first step towards any audit and would save loads of money. Why not?