Hacker News new | ask | show | jobs
by taude 146 days ago
how is it hallucinating links? The links are direct links to the webpage that they vectorized or whatever as input to the LLM query. In fact, on almost all LLM responses DuckDuckGo and Google, the links are right there as sited sources that you click on (i know because I'm almost always clicking on the source link to read the original details, and not the made up one
2 comments

I would imagine links can be hallucinated because the original URLs in the training data get broken up into tokens - so it's not hard to come up with a URL that has the right format (say https://arxiv.org/abs/2512.01234 - which is a real paper but I just made up that URL) and a plausible-sounding title.
Yeah, but the current state of ChatGPT doesn’t really do this. The comment you’re replying to explains why URLs from ChatGPT generally aren’t constructed from raw tokens.
You are absolutely right! The current state of ChatGPT was not in my training data.
How do you explain it then, when it spits out the link, that looks like it surprisingly contains the subject of your question in the URL, but that page simply doesn't exist and there isn't even a blog under that domain at all?
Near as I can tell, people just don’t actually check and go off what it looks like it’s doing. Or they got lucky, and when they did check once it was right. Then assume it will always right.

Which would certainly explain things like hallucinated references in legal docs and papers!

The reality is that for a human to make up that much bullshit requires a decent amount of work, so most humans don’t do it - or can’t do it as convincingly. LLMs can generate nigh infinite amounts of bullshit for cheap (and often more convincing sounding bullshit than a human can do on their own without a lot of work!), making them perfect for fooling people.

Unless someone is really good at double checking things, it’s a recipe for disaster. Even worse, doing the right amount of double checking makings them often even more exhausting than just doing the work yourself in the first place.

I’ve used Claude code to debug and sometimes it’ll say it knows what the issue is, then when I make it cite a source for its assertions, it will do a web search and sometimes spit out a link whose contents contradict its own claim.

One time I tried to use Gemini to figure out 1950s construction techniques so I could understand how my house was built. It made a dubious sounding claim about the foundation, so I had it give me links and keywords so I could find some primary sources myself. I was unable to find anything to back up what it told me, and then it doubled down and told me that either I was googling wrong or that what it told me was a historical “hack” that wouldn’t have been documented.

These were both recent and with the latest models, so maybe they don’t fully fabricate links, but they do hallucinate the contents frequently.

> maybe they don’t fully fabricate links

Grok certainly will (at least as of a couple months ago). And they weren't just stale links either.

After getting beaten for telling the truth so frequently, who wouldn’t start lying?