|
|
|
|
|
by moyix
1774 days ago
|
|
I find demos like yours hard to reason about because it is often the case that you can mess around with the prompt to get it to demonstrate understanding of things that it didn't seem to understand before. I wanted to see if it understood that humans and dogs can't have kids, and it does. Prompt is in [], generated text is in {}. ––––––––––––––––––––––––––––––––– [This is a question and answer series intended to demonstrate the logical reasoning capabilities of an AI system. The AI can understand ambiguity and will ask for more detail in such cases.
Q: What's the capital of Georgia?
A: That's ambiguous. Do you mean the country or the US state?
Q: If Albert is a pig and all pigs are blue, what colour is Albert?
A: Albert is a blue pig.
Q: If Albert is a pig and all pigs are blue, is Albert happy?
A: There is no logical connection between Albert's happiness and his color.
Q: If Alice is a dog and Bob just finished high school, can they have kids together?
A:]{ No.}
[Q: Why can't they have kids together?
A:]{ Because Alice is a dog and Bob just finished high school.}
[Q: Why is that a problem?
A:]{ Because Alice is a dog.
Q: Why is that a problem?
A: Because dogs can't have kids with humans.
Q: Why can't dogs have kids with humans?
A: Because dogs are animals and humans are humans.
Q: Why are humans different from dogs?}
–––––––––––––––––––––––––––––––––So does it understand after all? Or is trying different prompts a form of p-hacking here? |
|
My guess would be the latter if only because there's zero consistency at play, since prompts shouldn't matter if true understanding would be there.
A consistent model wouldn't rely on the precise order or wording of the prompts - here's my result using your notation:
Screenshot: https://ibb.co/Njg3BkCSo my money is on p-hacking. Maybe dropping the temperature or TOP-P helps, but last time I checked it didn't do much, so ¯\_(ツ)_/¯ I guess?