Hacker News new | ask | show | jobs
by imoverclocked 36 days ago
I don't trust facts from LLMs. When I am searching for something, I usually want to find primary sources. As soon as a number is involved, I do my best to not even look at the AI output.

Even though the result is often good and combines information from multiple sources, it can also get things wrong by combining information from different eras or just plain outdated advice. AFAICT, without primary sources, the result is for entertainment purposes only.

10 comments

> When I am searching for something, I usually want to find primary sources.

And therein lies the rub; for years now Google's search results have returned useless SEO garbage. For now, it definitely seems like an LLM answer is better than what was being returned and I guess this is the reason why Google ripped it out.

An LLM answer is not "better", it's in a completely different category. LLM answers can be useful, for topics where you can easily verify a fact (i.e if you ask for a Linux command and it gives you one, you can run it and see if it did what you wanted), or for topics which are more opinion than pure fact ("list some trade-offs between decision A and decision B"). But when you want information that's provided by some authoritative source, you want to see it from that source.

Google Search has been terrible for a long time. But you could still dig through it and find those primary sources. That is, in my opinion, the primary purpose of a search engine. Replacing it with what an LLM has invented based on ingesting both reliable and unreliable sources is not viable for a large category of things. The main way we can judge the reliability of something is to loo at where it comes from. If I'm looking for, say, official US job market statistics, whether I trust the numbers I find depends on whether I find them published on a US government website or on a random person's blog. A number presented to me by a chat bot would not let me judge, so it's useless.

The best a language model could possibly do, by definition, is to find websites and link them to me, letting me judge their credibility. But then it's just a worse search engine.

> But you could still dig through it and find those primary sources. That is, in my opinion, the primary purpose of a search engine.

And you are a small minority. People go to google to get answers, not to look for articles in order to look for answers in the articles.

Yeah all you need is answer-shaped text. Why would the truth of that answer matter at all?
Personally I think I've developed a pretty good sense of when a question is easy enough that I can just trust the AI overview, and when I need to dig deeper. Google's original AI overviews were not reliable enough to ever trust, but now they are usually accurate summaries of the cited sources.

Job market statistics are actually probably a strong point for the AI overview. I just Googled 'us job market last month' and got an AI overview that accurately summarized a New York Times article for qualitative information ("surprisingly strong 115,000 jobs", "no-hire, no-fire"), followed by accurately summarizing the official Bureau of Labor Statistics website for raw stats, followed by some other stuff I didn't check. Not everyone would prefer The New York Times' take, but the citation prominently displays their name and logo, so you can tell what you're getting.

Weak points are when the topic is obscure enough that the AI overview conflates two different things or overgeneralizes, or trusts the wrong sources.

If Google can't filter out the SEO spam from their results, why do you think they did it for the LLM training data?
The training process literally ingests the majority of text on the internet, including a huge volume of SEO garbage, and seeks to create a self-consistent compressed model of that. This is totally imperfect of course but is also likely more truthful than the median Google result, because of the incentive for self-consistency and coherence that is created by the reward function as well as during RL.

Imagine that you had 1,000 years to read every Google result on a particular topic, and literally infinite patience. You would read a lot of rubbish but ultimately you are a smart person, you would figure out the underlying truth and likely produce something that is more valuable than the average or even the sum of the parts.

Honestly this feels like wishful thinking. If they could do it at all, they could do it to fix search.
Why are you assuming that they want to filter out the SEO spam?
It's a new frontier and people have not targeted it yet?
You can ask them to cite their sources. It's very good practice to do so, and to check those sources, because I've found that about 30-40% of the time their source doesn't support their answer at all.
If it's wrong 2 out of 5 times, why even waste your time going to it in the first place? That's a massive failure rate.
Because it finds the sources much quicker than I would have been able to on my own, and I can then synthesize them into data I know is correct, as correct as any human-generated data can be of course.
But what that because their search was so bad that it took you that long to find the sources?
No, it's usually because it finds sources that I would not have even thought to search for in the first place.

Agentic AI has its faults, but one thing I've found it to be very good at is surfacing the "unknown unknowns": things I didn't know I should have searched for but that are directly relevant to my problem.

Because way more than three out of five Google results are SEO garbage or sponsored crap. The bar has been set extremely low by Google, a 60% validity rate sounds magical.
Indeed, enshittification has been so thorough, internet search is virtually useless now.
If I'm going to an LLM (as with websearch before it), it's usually because I don't know the answer, don't have anyone close to me that knows the answer, and can't pay anyone (or don't know who to pay) for the answer. In other words, my failure rate without the LLM would be 100%.
The problem is that everything you have said renders you unable to determine the validity of the answer provided.

Sometimes that is fine, sometimes it is not

It's much easier to determine the truth of an answer than it is to come up with that answer yourself. This is analogous to the P=NP problem or the recognition vs. recall problem: it is much easier to recognize and verify a correct answer than it is to recall or generate it yourself.

I've got a pretty solid algorithm for checking correctness: I ask the LLM for its sources, I try to find 3-5 independent ones (that are not just copying each others' answers), and if they all agree, that's very likely to be the correct answer. Simple math here: if you have 5 sources and they are each 60% likely to be correct, then an LLM choosing at random from them would have a 60% success rate, while someone checking all 5 of them for agreement would have a 1 - (0.4^5) = 99% chance of being correct. It's a good algorithm for doing other things like verifying scientific papers, too: you look for indendent research groups that have all reproduced the same findings.

I did the same thing with ten-blue-links websearch as well, and hope this would be the habit of anyone else too. (Although I know it wasn't, because I worked on Google websearch 15 years ago, on a project to increase the credibility of search results, and we did cafeteria UX studies about "What makes a credible result?" and everybody said "Because it appears as the top result on Google.")

Because being right 60% of the time with minimal work is still amazing, as long as one accounts for the failure rate correctly.

Say I want to look up some game from my childhood, which I barely remember any details for. Going to google and trying is likely going to be very difficult unless I happen to get lucky with some key element. But if an LLM can get it right even a minority of the time, it can lead to me quickly finding the game I'm looking for.

This does depend upon the ability to evaluate the answer, like checking against source or some other option where you know a good answer from bad. If you can't, then it does become much more dangerous. Perhaps part of the reason AI seem to empower experts more than novices in some domains?

Because being right 60% of the time with minimal work is still amazing, as long as one accounts for the failure rate correctly.

Please don't ever go into aerospace. Or healthcare. Or engineering. Or pretty much anything that actually matters.

I don't find it nearly that bad. If I really need factual information, it will generally go off and read the data from primary sources anyway. So unless it's really misunderstanding context, you're getting the data from the source.
It really matters the task. General knowledge from Wikipedia, great. Things more specific, with any thought needing to be used, or technical fields outside of software his numbers are pretty close to mine.
The problem too, is that we're all using different tools with different experiences -- there isn't one "AI". And if you're not paying for it, you're getting some real bad experience.
The point is the sources will disappear without the traffic/ad revenue.
With Google returning lists full of SEO spam, 2 out of 5 is quite good. If you know something better than that, I'd love to hear it.
If I have to read the sources anyway, why not just have the model give me the links themselves? You know, like search engines already do?
Search engines don’t do that any more - they just give you a bunch of SEO spam sites, now mostly filled with plausible slop. Answers from search are _less_ reliable than answers from an LLM now.

I worry that the LLMs are just the equivalent of a ‘lagging indicator’ of web quality though - that they will also soon be overwhelmed with the sheer volume of plausible nonsense that is the web now, just like search engines are.

Model collapse everywhere.

If the LLM is capable of providing good citations, then those citations could be returned in the same format as traditional search engines, not the new, LLM generated content first format. If they aren't capable of providing good citations, then the suggestion I was replying to is incorrect (and you'd have no way of knowing if they were right or not)
In general users don't like to have to follow citations, even if they should. They'd rather have an answer right in front of them, even if there's a good chance that it's wrong.

Google, like most consumer product companies, designs for the majority. Citations are a niche feature for the 5-10% of users that like to do their own research. The majority just wants an answer, which has been the direction Google's gone in since Knowledge Panels and the Answer OneBox came out in 2012.

That might make sense (at least on the first order, second order effects would still be horrible) if the LLM generated answer was reliably correct. It isn't.
Yes, but this is much more effort than a traditional search result that has a relevant quote from the source right there.
ChatGPT is the only bot that reliably cites sources (through Web search mode).

The other bots either make up links or simply don't provide any information that is distinguishable from the LLM predictive output.

Ironically Gemini is also very bad at this, while it should have been the best at Web search.

Gemini also does something very patchy, which is to provide "links" which are in fact GET queries into classic Google search. I'm guessing they did it this way because the links generated/hallucinated by the LLM were too unreliable.

All of Google AI Mode is sourced.
Yes, and those sources often contradict the AI summary if you follow them (or if you know anything about the topic).
A common pattern:

Type your question in Android/Chrome search bar:

"Is …?"

AI Overview on the search results page:

"No…"

Click through to the AI mode tab/"Dive deeper with AI" CTA:

"Yes…"

I love when I read the source link and it says the exact opposite of what the AI summary said.

Sorry, no, I hate that.

A lot of the time they hallucinate the sources, too.
Dont they all do that ?

I know that deepseek has links for every chain it makes where you can read the source and it's actually a good thing to check on that.

Asking an LLM to cite sources just leads to hallucinated sources, same as any other attempt to make it explain its thinking process. It doesn't have actual visibility into its internal processes, just rationalizes an explanation.
I never ask deepseek to cite sources, it just does. And I check everytime, it always corroborate to what is being said.
If it even exists.
Even before the AI era I slowly became less and less successful with google searches. Everything - non trivial / specific - that I looked for turned into a chore and I quickly gave up.

LLMs, that can supply valid links, give me a completely different variety of results. Either I am too dumb to search manually, too impatient or google search is just broken, but Gemini usually gives me something I can work with. I just wished I could blacklist some sources like medium.

Checkout Kagi. You can blacklist sites. You can also weight certain sites higher than others. I've been using it for almost a year at this point. When I'm forced to use Google at work, I am legitimately less effective at finding the information I need.
Google has been going downhill for a decade now.

I've been paying for Kagi for like four years. I like it but also resent that it's something I pay for now when I remember how good Google was 20 years ago.

Google search is just broken.
Maybe SEO-maxxers will finally leave it alone now if the median consumer trusts the corporate models
-site:medium.com in the search bar

This will remove any results from there for you.

Alternatively, site:news.ycombinator.com would search this website explicitly.

> Even though the result is often good

From the past hundreds of Google searches I've done where I got an AI summary, I'd say the result is actually rarely good. At the very least 80% of the outputs contain critical mistakes, often exactly about the specific thing you're asking.

It's all slop. Look at the first two examples in their own announcement: fitness and wellness slop from websites like "top 11 exercises to do when you work from home", and god damn sneaker drops and what bloody influencers are saying about some celeb-endorsed sneaker. Jesus christ
Sometimes I use chatgpt thinking mode for searches when I expect there will be a lot of noise. "What are some in-depth reviews for <some book I've heard of>"

Have you tried explicitly asking for links to primary sources?

I see ChatGPT traffic to my website for hallucinated pages.
It means they actually clicked the link. That's what you're supposed to do!
It means that they put my name on things that I have never said about topics that I did not even cover.
Let say I’m asking “what’s the latest in Hormuz?”. What is the primary source for this?

For most things I research, there is only secondary sources, reporting on an event, a trend…

Sounds like a good question to ask Google (… only partially joking)

There are many primary sources depending on exactly what you are looking for. Shipping/port manifests or even stats are often findable. People in the region witnessing first-hand what is happening. If you are interested in political views then people who are in charge of policy and control resources in the region. Etc etc…

If you want a summary then, yes, you want a journalist or another source that looks at primary sources and has some knowledge of the region to start with to help give context to a specific situation.

Just wanted to check you really meant primary source.

I don’t understand how you can learn anything then. News is off limits for you, history, sciences (unless you do the experiment yourself) as well. Math might be possible, at least you can check the proof yourself without having to travel anywhere. At least one can see the earth isn‘t flat by getting on a plane, but one has to squint a bit.

More seriously, to summarise, I don’t believe you live by this principle, you would live under a rock and I wouldn’t be chatting with you.

You have made a strange logical jump. By answering your question you seem to have gleaned more about me than seems reasonable.

I did state, “when I’m searching for something,” which (in my view) is different than simply being curious or watching/reading the news; One does need to be careful with what news they consume.

I usually don’t search for my news.

>one does need to be careful with what news they consume

100% agreed, it has become very difficult to stay balanced and not end up in some ideological echo chamber.

> “when I’m searching for something,” which (in my view) is different than simply being curious or watching/reading the news

yeah ok, that’s nifty. Still, one search requires traveling, meeting people to hear from their mouth the statement the book / newspaper reported about, for example. How many searches a year to you manage?

> Still, one search requires traveling

It doesn’t.

Yeah pretty much.

I have seen it hallucinate things confidently but that is usually when it has no direct sources to pin down the output.

I think you/we are in the minority. I’m surrounded by parents that start sentences with “ChatGPT told me…” or “I asked AI and…”.

We’re often talking about something that the literature refutes, but the LLM was trained on a bunch of public content from resources such as whattoexpect.com, full of terrible parenting advice.

People didn’t bother with sources and research before, they don’t bother now, AI is just a magical thing for them.

I don't trust facts from humans. When I am searching for something, I usually want to find direct sensor readings. As soon as a number is involved, I do my best to not even look at the human output.

Even though the result is often coherent and confidently synthesizes information from multiple experiences, it can also hallucinate, suffer from recency bias, or accidentally merge memories from different decades. AFAICT, without access to the underlying telemetry, human responses are for entertainment purposes only.