Hacker News new | ask | show | jobs
by ralferoo 10 hours ago
Not only is the inverse not generally true (as others have pointed out), their examples requires several mental leaps.

"Who is Tom Cruise's mother? [A: Mary Lee Pfeiffer]" and the reverse "Who is Mary Lee Pfeiffer's son?"

The word "mother" has no relationship to "son" in terms of the model, and so while the model might be able to infer a proximity relationship between "Tom Cruise" and "Mary Lee Pfeiffer" just because they appear in the same sentence, expecting the AI to guess that the inverse of mother is son is a bit of a stretch, especially when they're both lossy mappings, because the relationship is {mother,father} <=> {son,daughter}. If we're going to train models to make that mental leap, we'd have to put up with false results like "Tom Cruise is the daughter of Mary Lee Pfeiffer" unless the model is also supposed to infer that Tom means he can only be a son.

1 comments

Pretraining could be reasonably expected to make it learn that mother/father and son/daughter are inverse relationships and Tom is usually a male name.
I'd argue that that's not an easy task in and of itself, but even if someone adds a special exception, there's still the issue that there are many other types of inverse relationship that we understand, but a machine that's just doing pattern matching can't be expected to understand. For instance "boss" and "employee". For instance "waiter" and "customer". For instance "manager" and "player" (in a football context) or "manager" and "artist" (in a music context) or "manager" and "customer" (in a bank context). And what's the inverse of "customer" now? And so on and so on...

All of this context works because we build up an extensive model of the world through the course of our lifetimes. LLM models don't do that, they pattern match based on stats.

Somebody would have to decide each of these things is important and create training data sets for each of them. But we implicitly understand so much context about the world that it's practically impossible to document everything we know in the form that a model can actually learn from.

So by extension, the question "The first letter of Oman is _", and "O is the first letter of country _" the same for humans?
This is obviously not as symmetrical as the initial problem, but yes you are expected to be able to easily answer the second after you read the first in a text. That is the concept of quite a lot of early secondary education level tests and also used when learning another language.
But again, you can only make that determination when you know that Oman is a country. LLM's don't know this as a fact, even if they're able to regurgitate a sentence that states this.
We have both Sean Young and Sean Bean. Black swans still exists and the pretraining cannot rely on assumptions - provided if you want answers, not hallucinations.