Hacker News new | ask | show | jobs
by Aurornis 14 days ago
> “All forces will have a policy that says, ‘Check everything that it produces’.”

Everyone I talk to (including outside of tech) is going through this phase at their companies. It’s not working.

Checking the output seems like a simple request, but the question becomes: Check against what? If the police are making a document that sources from another report that another officer used AI to produce from their notes which were also run through AI and on and on, an inconsistency that leaks in at a previous step will check out when someone reviews the output against the inputs.

We’re all also discovering that many people’s idea of reviewing the output is to skim it and verify that it looks convincing enough. Checking facts is hard and takes time. These people are using AI because they want to work less, not to give themselves extra work.

7 comments

One can ask, what is a practical difference between “Check everything that it produces” and “Do all the work yourself”?

It’s not typing that’s the bottleneck, at least not often, so this is essentially assuming that you can do all the needed work without actually doing it, which is obviously wishful thinking.

This is definitely the most interesting question in a ton of AI applications. I think folks should be really be spending a lot of time on figuring out how to deterministically check AI outputs in a way that's reliable in order to reduce the amount of work a human has to check, and to build tools that speed up the checking process.

Thinking about all of the fake citations in legal submissions that have come up of late, it seems pretty straightforward to set up a regex that captures all forms in which a cited case might be written (I could be wrong but I'd assume there's some standard variety of formats) and search those against a database (again assuming such a database exists) to ensure they all exist.

Then for the tougher problem of making sure that the cited cases say whatever the document citing them says they do, you could have an LLM run through the document, pull out the text with the case name and text about why it's being cited, then read the case and try to determine whether the reason for citing it is valid. Rather than just give a yes/no, you'd put the doc in front of the user and let them jump from citation to citation. On each citation, it'd pop up a card that shows the literal text of why it's being cited, a judgement from the LLM of whether it matches what the case says, and snippets of text from the case as evidence + deeplinks to that text within the case.

Or maybe you wouldn't even want to give the LLM's judgement since people might rely on that without reading, but there's definitely a way to speed up the review.

I believe OpenEvidence does something like this with medical papers. If you ask it a medical question, it doesn't answer so much as link you directly to the relevant papers so you can read them and determine if they're useful. Avoids all of the potential risks of using an LLM but still hugely valuable and time-saving for docs.

excellent point. it is like saying computers in the 90s.

remember how the bank giving your money to the wrong person was a crime? and then when "the computet" did it was just business as usual and you paid more for banking because now they had "computer fraud" insurance?

same thing. cop deliver false report, jail (hah! i know). now, it was "the Ai". so no jail, they will go back and put rules for the cop to read or something.

and we are making everything worse by the minute. One gov push back on Ai nonsense, ibm/rh cames up with all sort of lies that would make any engineer or research laugh on their faces (federated learning being for privacy, instead of cost cutting. or explainable Ai being real, and not something bolted after the inference with extra unexplainable inference. etc.) but that are good enough to fool the regulator.

All this, so people like us can do a job that wasn't that hard in the first place (and in fact was quite comfortable all things considered), just a little bit easier, for companies that are promising to lay us off for productivity gains that aren't even measurable.
my friend, jobs only matter in volume not quality.

why do you think every capital accumulation in the past caused "absentee landlords"? everyone wants the profit without even having to collect the rent.

Ai (or Bangladesh) doesn't have to do the work better than you, just deliver more X, where X is work output div capital input.

Depends, is P = NP?
> Checking the output seems like a simple request, but the question becomes: Check against what?

A colleague of mine circulated "minutes" from a meeting last week, there were only three of us in the meeting (one external service provider, my colleague + me).

There were several items on the "minutes" which I didn't recall being discussed, so I asked him if he'd had AI help, he said AI was filling in the gaps based on its knowledge of other discussions he'd had with it.

Glorious.

Even if you have some AI transcribe in real time, they introduce the kind of subtle mistake - like negating a statement - that is plausible, but undetectable in retrospect without the accompanying audio.

I often review the transcriptions that zoom produces out of curiosity, and this happens constantly. Makes them essentially useless as a statement of record

This is a great way of capturing the core problem. Fact-checking a document is a difficult skill! Expecting people who've never had to do that before to just start doing it - when these AI tools are supposed to save them time and make life easier - is not reasonable.
> Checking facts is hard and takes time.

This is a reasoning skill and I fear we are losing this skill if we do not update our education systems (all over the world) very fast. Children will use AI, and we should also see this as a tool. However, children now need to learn how to verify the output of the tool, and this is something what they do not do (and they also are not incentivized to do so). Which needs to change, otherwise the next generation of people exiting high school can use AI, but they have not learned how to reason themselves.

This also happens I guess in companies, AI is the simple way, the reasoning is the "hard" part which costs energy, and people need to be incentivized to do so. I have no idea how to create that incentive though.

Education system has already lost and it happened before AI. The problem is that reasoning requires a very good literacy skills and these require a lot of practice to develop, but we stopped forcing students to practice this quite some time ago. A mass of illiterate students have been reality for several years all over the world.
> We’re all also discovering that many people’s idea of reviewing the output is to skim it and verify that it looks convincing enough.

I mean over time I've come to believe that most people are just _bad at reading_ - if you ask these people to compare two documents they'll say that they are the same if the wording or surface "feel" is at all similar, even if in the precision of the statement they say the opposite.

See also: People being generally bad at listening and hearing what they want to instead of anything quantitatively derivable from what the other person said.

None of this is written as an excuse for AI.

Here's a massive document, without any real context as to what thinking went into the points it's making, tell me if it looks ok. Oh, and there's 10 more where that came from.

We're outsourcing the thinking to the recipient.

Yes, it's way easier to create the report now, but it's not being honed down to the crux of the points it needs to make. And the reviewers are expected to what? Up their ability to mentally consume and reason about reports.

I mean, barrier number 1: did you read it yourself before asking someone else seems too high for some..

Never before has the adage "if I took* more time I would have written less" been true.

*Yes, I know

its just duplicating the work at that point, because you have to check everything anyways