Hacker News new | ask | show | jobs
by hirvi74 110 days ago
The other day I asked Claude Opus 4.6 one of my favorite trivia pieces:

What plural English word for an animal shares no letters with its singular form? Collective nouns (flock, herd, school, etc.) don't count.

Claude responded with:

"The answer is geese -- the plural of cow."

Though, to be fair, in the next paragraph of the response, Claude stated the correct answer. So, it went off the rails a bit, but self-corrected at least. Nevertheless, I got a bit of a chuckle out of its confidence in its first answer.

I asked GPT 5.2 the same question and it nailed the answer flawlessly. I wouldn't extrapolate much about the model quality based on this answer, but I thought it was interesting still.

(For those curious, the answer is 'kine' (archaic plural for cow).

2 comments

Of course it’s important to remember that the ability of an LLM to answer an obscure riddle like that has nothing to do with its reasoning abilities, but rather depends on whether the answer was included in its training dataset.
The word is in most online dictionaries for what it is worth. It's also used in Biblical texts, albeit only a handful of times. I do agree it's not a true assessment of an LLM's overall reasoning. No person I have ever asked that riddle to has gotten it correct. Then again, that is probably partly the point of the riddle.

I would like to reiterate that both Claude and GPT answered correctly. It was just bizarre how Claude got a initial, minor detail incorrect, but reasoned enough to get the more difficult answer correct.

I think the point here is that an LLM might not get the correct answer because it hasn't found it yet by scraping Twitter, Facebook, Wikipedia, etc. Artificially limit the training set and it will never, ever get it right.

Whereas, a human of average intelligence, in possession of a dictionary or perhaps just a list of animals, could reason their way through getting the correct answer in finite time. They could probably get it without the list if they really wanted to.

Yes, yes, a thousand times yes! As recently as yesterday I was forced to abandon a conversation with a normie because I couldn't convince her of this fundamental limitation. ChatGPT was "damn near magical" in her opinion. Sigh.

Is it time to update the laws? [1]

1. https://en.wikipedia.org/wiki/Clarke%27s_three_laws

I can replicate a vaguely similar result (gpt-5.2 produces the correct answer immeditely, Opus 4.6 "thinks aloud" in the output for two lines and then produces the correct answer), but I worry that 5.2 might be thinking under the hood here.