Hacker News new | ask | show | jobs
by madcaptenor 146 days ago
I would imagine links can be hallucinated because the original URLs in the training data get broken up into tokens - so it's not hard to come up with a URL that has the right format (say https://arxiv.org/abs/2512.01234 - which is a real paper but I just made up that URL) and a plausible-sounding title.
1 comments

Yeah, but the current state of ChatGPT doesn’t really do this. The comment you’re replying to explains why URLs from ChatGPT generally aren’t constructed from raw tokens.
You are absolutely right! The current state of ChatGPT was not in my training data.
How do you explain it then, when it spits out the link, that looks like it surprisingly contains the subject of your question in the URL, but that page simply doesn't exist and there isn't even a blog under that domain at all?
Near as I can tell, people just don’t actually check and go off what it looks like it’s doing. Or they got lucky, and when they did check once it was right. Then assume it will always right.

Which would certainly explain things like hallucinated references in legal docs and papers!

The reality is that for a human to make up that much bullshit requires a decent amount of work, so most humans don’t do it - or can’t do it as convincingly. LLMs can generate nigh infinite amounts of bullshit for cheap (and often more convincing sounding bullshit than a human can do on their own without a lot of work!), making them perfect for fooling people.

Unless someone is really good at double checking things, it’s a recipe for disaster. Even worse, doing the right amount of double checking makings them often even more exhausting than just doing the work yourself in the first place.