Hacker News new | ask | show | jobs
by onlyrealcuzzo 521 days ago
Gemini is the leading model with the lowest hallucination rate: https://www.visualcapitalist.com/ranked-ai-models-with-the-l...

I would expect that number to go down from 1.3% to below 1% over the course of the year.

There's always a chance what you're reading is wrong - due to purposeful deception, negligence, or accident.

Realistically, hardly anything is 100% accurate besides math.

4 comments

I think people really don't understand the effort, care and risk that goes into producing quality reporting.

I work with investigative reporters on stories that take many months to produce. Every time we receive a leak there is an extensive process of proving public interest before we can even start looking at the material. Once we can see it in we have to be extremely careful with everything we note down to make sure that our work isn't seen as prejudiced if legal discovery happens. We're constantly going back and forth with our editorial legal team to make sure what we're saying is fair and accurate. And in the end, the people we're reporting are given a chance to refute any of the facts we're about to present. Any mistakes can result in legal action that can ruin the lives of reporters and shut down companies.

Now, imagine I were to go to a reporter who has spent 6 months working on a story about, for example, a high profile celebrity sexually assaulted multiple women, how the royal family hides their wealth and are exempt from laws, or how multinational corporations use legal loopholes to avoid paying taxes, and said, "oh, 1% of people reading this will likely be given some totally made up details".

Given that stories often have more than a million impressions, this would lead tens of thousands of people with potentially libellous "hallucinations".

It simply should not be allowed.

LLMs have their place, for sure, but presenting the news is not it.

Although I agree with every single sentence you've said, we've seen in the past decade how only very small percentage of people actually care about the content of the news. Everyone just discusses and gets their information from the headlines, so this is a natural consequence of "let's just summarize it to a couple of sentences since nobody reads it anyways".
The Gemini models themselves may score well on this, but Google's feature implementations are a whole other thing. AI Overviews frequently take untrustworthy search results (like a fan fiction plot outline for Encanto 2) and turn those into confidently incorrect answers. https://simonwillison.net/2024/Dec/29/encanto-2/
And doesn't bringing in The Associated Press solve this problem? No need for the AI to decide what is trustworthy or not. For the vast majority of people everything The Associated Press publishes is trustworthy.
1.3% isn't great. I'd rather just go, and pay, directly to trusted news sources. Everyone has different tolerance for falsehoods and priorities I guess.
What's the error rate for human journalists? Based on my experience, I'd guess it's much higher than 1.3%.
As others have already pointed out, feeding these new articles aren't magically going to make them any more accurate. These hallucinations are going to be on top of any errors in the data sources.

I'm not replying to point that out, I think others have done a better job. It's mostly that this conversation made me think of this classic Babbage quote that I've always enjoyed.

"On two occasions I have been asked, – 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question"

Except when that happens, a clarification is almost always added at the bottom of the article ("This article was amended on [date]. An earlier version said xxx" or some variation thereof). You're not gonna get a second push notification from an AI summary saying "Oopsies, the previous notification was wrong". Once it's out, it's out, and that sort of damage is difficult to repair.
Yes, but that's going to be on top of the ~1.3% hallucination rate (largely, there's always some very small chance it hallucinates the truth when the article had it wrong - but basically not worth considering).
Anything other than 0% is borderline immoral. Imagine sending a push notification to somebody's phone with a completely made-up headline summary. Even if it happens once in a hundred times, that's too much. Things like that slowly but surely erode trust and make it harder and harder to trust anything that's generated by AI, especially when it comes to news, where trustworthiness is essential, and probably the main reason people pay for news. See for example https://www.bbc.co.uk/news/articles/cge93de21n0o
This is a ridiculous standard. News headlines at the moment would have an error rate wildly above 1.3%. The articles about Apple having trouble with LLM headlines is that the on-device model is weak and it's trying to compress too much into too few characters. I'd guess the chance of Gemini incorrectly summarising an article to be almost 0%.
Have you ever read a news article on a subject where you have expertise and knew it was inaccurate? The news is probably more inaccurate than you think.

I bet you think the news is accurate all other times. It’s called “Gell-Mann Amnesia”

You’d have to pay quite a bit to get journalists to answer your questions specifically.

The whole isn’t about generating news articles, it’s about getting the model up to date on facts so it can synthesize a newspaper for you. I’d say it’s a way to get journalists to be journalists again instead of clickbait composers - as long as the model doesn’t inject clickbait there itself. I don’t trust Google to not do it sometime, but they aren’t doing it now and the infrastructure is being made for others to consume when Gemini suffers from inevitable enshittification.

> You’d have to pay quite a bit to get journalists to answer your questions specifically.

This isn't what I meant. I pay directly for subscriptions/donations to news organizations that employee journalists that do this original reporting. I don't want a middle man that just messes it up. This goes for LLMs and for free news sites that don't do much more than summarize original reporting. I've seen more than a few times where they inject opinions, mess up facts or put focus on what was originally a small side point in the article.

> There's always a chance what you're reading is wrong - due to purposeful deception, negligence, or accident.

I am quite certain my personal hallucinations level is more than 1.3%, obviously we want our machines to be better than us, but my doctor once said folic acid is not a vitamin.