|
|
|
|
|
by picometer
741 days ago
|
|
Good point - I saw the FLAN anomaly and this didn’t occur to me! A good follow up question would be: why didn’t the other models do better on the 2nd-order question? Especially BLOOM and davinci-003, which were middling on the 1st-order question. I agree on your overall criticism of the experimental protocol, though. |
|