Hacker News new | ask | show | jobs
by picometer 741 days ago
Good point - I saw the FLAN anomaly and this didn’t occur to me!

A good follow up question would be: why didn’t the other models do better on the 2nd-order question? Especially BLOOM and davinci-003, which were middling on the 1st-order question.

I agree on your overall criticism of the experimental protocol, though.