Hacker News new | ask | show | jobs
by crazygringo 248 days ago
In my experience doing literature super-deep-dives, it hallucinates sources about 50% of the time. (For higher-level literature surveys, it's maybe 5%.)

Of the other 50% that are real, it's often ~evenly split into sources I'm familiar with and sources I'm not.

So it's hugely useful in surfacing papers that I may very well never have found otherwise using e.g. Google Scholar. It's particularly useful in finding relevant work in parallel subfields -- e.g. if you work in physics but it turns out their are math results, or you work in political science and it turns out there are relevant findings from anthropology. And also just obscure stuff -- a random thesis that never got published or cited but the PDF is online and turns out to be relevant.

It doesn't matter if 75% of the results are not useful to me or hallucinated. Those only waste me minutes. The other 25% more than make up for it -- they're things I simply might never find otherwise.

2 comments

So, the exact stuff Google used to be good at.
The exact stuff I now use Kagi for. Finding obscure relevant PDFs that Google didn't is literally one of the things that made me switch.
Pretty much, though Google got bad at these things well before LLMs really came on to the scene, and we can all debate which project manager was responsible and the month and year things took a downward turn, but the IMO obvious catalyst was that "Barely Good Enough" search creates more ad impressions, especially when virtually all of the bad results you are serving are links to sites that also serve Google managed ads.
Oh, sure, Google was starting to take a dive almost a decade before LLMs came on the scene.
The main reason Google doesn't find good search results anymore is there are no good search results anymore because there are no websites anymore. You can't do it much better.
Right, Google definitely isn't helping themselves IMO,

but the reasons search got hard was that it became profitable to become the "winner" of a search query. It's a hostile market that works to actively undermine you.

AI absolutely will have the same problem if it "takes over" except the websites that win and get your views will not look like blogspam, they will look like (and be) the result of adversarial machine learning.

It was a very clear point: when Amit Singhal was kicked out for sexual harassment in the me too era. He was the heart of search quality but he went too far when he was drinking.
Apple is the only firm that seems to do a good job in preventing itself from falling prey to what leads to the demise of every corp in history.
Nope. I'm talking about the stuff keywords are no good at, and which Google Scholar doesn't tend to surface because it's just not cited much or it's from a different niche.

The fact that LLM's understand your question semantically, not just with keyword matching, is huge.

Another win for big tech: Google has been enshittified to such a point that you can now spin up a machine that consumes 1000x the power to give you a result that has a coin toss odds of being totally made up.
That's nothing! Next gen will use the entire power output of a small nation for a week, to tell you a nice cake recipe.
A search query probably uses about 10x more electricity than a matching LLM query. There's enough wiggle-room depending on the assumptions that they might be about even. There is no way search uses 1/1000th of an LLM.
What questions are you asking LLMs where they're wrong 50% of the time?
People love gambling.
What is "it". Gpt-5 auto? Gpt-5 pro? Deep research? These have wildly different hallucination rates.
I use all of the current versions of ChatGPT, Gemini, and Claude.

The hallucination rates are about the same as far as I can tell. It depends mostly on how niche the area is, not which model. They do seem to train on somewhat different sets of academic sources, so it's good to use them all.

I'm not talking about deep research or advanced thinking modes -- those are great for some tasks but don't really add anything when you're just looking for all the sources on a subject, as opposed to a research report.

ChatGPT thinking mode is definitely the best search engine (wrapper) I've ever used, and you should be using it to find sources.
If these rates are known it would be great for OpenAI to be open about them so customers can make an informed decision
OpenAI has published a great deal of information about hallucination rates, as have the other major LLM providers.

You can't just give one single global hallucination rate since the rates depend on the different use cases and despite the abundant amount of information available to people on how to pick the appropriate tool for a given task, it seems very few people care to take the time to actually first recognize that these LLMs are tools, and that you do need to learn how to use these tools in order to be productive with them.

OpenAI goes into great detail on hallucination rates of GPT5 models versus o3 in the GPT5 System Card [1], section 3.7.

[1] https://cdn.openai.com/gpt-5-system-card.pdf#page12

"Known" implies that these rates are consistent and measurable. It seems to me, that this is highly unlikely to be the case