| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by taude 146 days ago
	how is it hallucinating links? The links are direct links to the webpage that they vectorized or whatever as input to the LLM query. In fact, on almost all LLM responses DuckDuckGo and Google, the links are right there as sited sources that you click on (i know because I'm almost always clicking on the source link to read the original details, and not the made up one

2 comments

madcaptenor 146 days ago

I would imagine links can be hallucinated because the original URLs in the training data get broken up into tokens - so it's not hard to come up with a URL that has the right format (say https://arxiv.org/abs/2512.01234 - which is a real paper but I just made up that URL) and a plausible-sounding title.

link

jjj123 146 days ago

Yeah, but the current state of ChatGPT doesn’t really do this. The comment you’re replying to explains why URLs from ChatGPT generally aren’t constructed from raw tokens.

link

madcaptenor 146 days ago

You are absolutely right! The current state of ChatGPT was not in my training data.

link

1718627440 145 days ago

How do you explain it then, when it spits out the link, that looks like it surprisingly contains the subject of your question in the URL, but that page simply doesn't exist and there isn't even a blog under that domain at all?

link

lazide 145 days ago

Near as I can tell, people just don’t actually check and go off what it looks like it’s doing. Or they got lucky, and when they did check once it was right. Then assume it will always right.

Which would certainly explain things like hallucinated references in legal docs and papers!

The reality is that for a human to make up that much bullshit requires a decent amount of work, so most humans don’t do it - or can’t do it as convincingly. LLMs can generate nigh infinite amounts of bullshit for cheap (and often more convincing sounding bullshit than a human can do on their own without a lot of work!), making them perfect for fooling people.

Unless someone is really good at double checking things, it’s a recipe for disaster. Even worse, doing the right amount of double checking makings them often even more exhausting than just doing the work yourself in the first place.

link

strange_quark 146 days ago

I’ve used Claude code to debug and sometimes it’ll say it knows what the issue is, then when I make it cite a source for its assertions, it will do a web search and sometimes spit out a link whose contents contradict its own claim.

One time I tried to use Gemini to figure out 1950s construction techniques so I could understand how my house was built. It made a dubious sounding claim about the foundation, so I had it give me links and keywords so I could find some primary sources myself. I was unable to find anything to back up what it told me, and then it doubled down and told me that either I was googling wrong or that what it told me was a historical “hack” that wouldn’t have been documented.

These were both recent and with the latest models, so maybe they don’t fully fabricate links, but they do hallucinate the contents frequently.

link

exmadscientist 146 days ago

> maybe they don’t fully fabricate links

Grok certainly will (at least as of a couple months ago). And they weren't just stale links either.

link

lovich 146 days ago

After getting beaten for telling the truth so frequently, who wouldn’t start lying?

link