| Opus (and all prior Claude models) still do not correctly answer the Sally question, unfortunately: Q: Sally (a girl) has 3 brothers. Each brother has 2 sisters. How many sisters does Sally have? A: Okay, let's organize the information we have: Sally is a girl.
Sally has 3 brothers.
Each of Sally's brothers has 2 sisters.
Now, let's think about this step by step: We know that Sally is a girl, so she is a sister to her brothers.
Each of Sally's brothers has 2 sisters. One of these sisters is Sally, but who is the other sister?
Since Sally has only brothers and no other sisters are mentioned, we can conclude that the other sister for each brother is also Sally. Therefore, Sally has no sisters other than herself.
So, the answer is that Sally has 0 sisters. https://imgur.com/a/EawcbeL |
GPT4 w/ CoT system prompting gets it right:
SYS: "You are a helpful assistant. Think through your work step by step before providing your answer."
USER: "Sally (a girl) has 3 brothers. Each brother has 2 sisters. How many sisters does Sally have?"
ASSIST: "Let's break down the information given:
1. Sally has 3 brothers. 2. Each brother has 2 sisters.
Given that Sally is a girl, she is one of the sisters. Since each brother has 2 sisters and we know Sally is one of them, there must be one more sister to make the statement true for each brother.
Therefore, Sally has 1 sister."
The importance of prompting makes it quite difficult to compare model peak performance. Especially since different models have different styles of prompts that generate peak performance.