Hacker News new | ask | show | jobs
by keiferski 110 days ago
Right now, 30 seconds ago, I asked ChatGPT to tell me about a book I found that was written in the 60s.

It made up the entire description. When I pointed this out, it apologized and then made up another description.

The idea that this is going to lead to superintelligence in a few years is absolutely nonsense.

2 comments

The other day I asked Claude Opus 4.6 one of my favorite trivia pieces:

What plural English word for an animal shares no letters with its singular form? Collective nouns (flock, herd, school, etc.) don't count.

Claude responded with:

"The answer is geese -- the plural of cow."

Though, to be fair, in the next paragraph of the response, Claude stated the correct answer. So, it went off the rails a bit, but self-corrected at least. Nevertheless, I got a bit of a chuckle out of its confidence in its first answer.

I asked GPT 5.2 the same question and it nailed the answer flawlessly. I wouldn't extrapolate much about the model quality based on this answer, but I thought it was interesting still.

(For those curious, the answer is 'kine' (archaic plural for cow).

Of course it’s important to remember that the ability of an LLM to answer an obscure riddle like that has nothing to do with its reasoning abilities, but rather depends on whether the answer was included in its training dataset.
The word is in most online dictionaries for what it is worth. It's also used in Biblical texts, albeit only a handful of times. I do agree it's not a true assessment of an LLM's overall reasoning. No person I have ever asked that riddle to has gotten it correct. Then again, that is probably partly the point of the riddle.

I would like to reiterate that both Claude and GPT answered correctly. It was just bizarre how Claude got a initial, minor detail incorrect, but reasoned enough to get the more difficult answer correct.

I think the point here is that an LLM might not get the correct answer because it hasn't found it yet by scraping Twitter, Facebook, Wikipedia, etc. Artificially limit the training set and it will never, ever get it right.

Whereas, a human of average intelligence, in possession of a dictionary or perhaps just a list of animals, could reason their way through getting the correct answer in finite time. They could probably get it without the list if they really wanted to.

Yes, yes, a thousand times yes! As recently as yesterday I was forced to abandon a conversation with a normie because I couldn't convince her of this fundamental limitation. ChatGPT was "damn near magical" in her opinion. Sigh.

Is it time to update the laws? [1]

1. https://en.wikipedia.org/wiki/Clarke%27s_three_laws

I can replicate a vaguely similar result (gpt-5.2 produces the correct answer immeditely, Opus 4.6 "thinks aloud" in the output for two lines and then produces the correct answer), but I worry that 5.2 might be thinking under the hood here.
Is that because this book is obscure and no human has yet written a description that could be scraped?
No it was just an anthology of papers by a reasonably well-known academic.

The problem is more that instead of reasoning and realizing it didn’t actually know about the book, it just looked up a description of the author and then invented what this book was supposedly about based on the title.

A more intelligent reply would have been something like, “the author appears to be an academic in international relations, but I cannot find this exact book title.” Instead it just repeatedly gave me fake answers.

Is that problem even fixable with what one might call normal means?

I presume it can't be with mere prompting, otherwise every single model would already include "Don't lie and don't make things up. If you don't know an answer, just say so."

It seems to me that the underlying algorithms would be inclined to confabulate simply because that's what LLMs do. But that's a bit beyond my math and programming understanding to say for certain.

Yeah I don’t know enough about LLMs to answer; it seems simple enough to make it check if X exists before pontificating about it. But maybe it isn’t, and there is a reason why.
The problem seems to be that if the LLM doesn't know if X exists it is unable to distinguish that from simply needing to create an answer about X.

That's the typical logic flow, innit?