|
|
|
|
|
by ttul
1015 days ago
|
|
Falcon fails. GPT-3.5 also fails this test. GPT-4 gets it right. I suspect that GPT-4 is just large enough to have developed a concept of counting, whereas the others are not. Alternatively, it's possible that GPT-4 has memorized the answer from its more extensive training set. |
|