| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by lazide 14 days ago
	How would you expect an LLM to produce reasonable decisions on that anyway?

1 comments

bandrami 14 days ago

"Do these documents contain models or descriptions of (list of devices redacted for HN), or personally identifying information?" would be a great question to be able to automate since it sucks up a lot of time that could be more profitably spent doing other things. There's costs to both Type I and Type II errors so deterministic filters only get us so far (which isn't very).

link

crisnoble 14 days ago

If it was incorrect 10% of the time would it be of help still?

link

bandrami 14 days ago

Our pre-LLM system does better than that, but any improvement would help us do more lucrative things with our labor hours

link

crisnoble 14 days ago

I am left wondering if it is such a critical task, how even 1% error rate would reduce human review of all outputs.

link

lazide 14 days ago

Humans of course will screw at least 1% of the time, at least judged retroactively.

The fun part is, if you have non-trivial inputs, even if you don’t change anything, you’ll likely get a different 1% set of errors each time no matter how perfect your judges.

10% seems pretty high, but it really all depends on what you’re evaluating. If it’s all weird edge cases….

link