|
|
|
|
|
by underanalyzer
1100 days ago
|
|
Great analysis, props to these students for taking the time to challenge such a sensational headline. In the conclusion they mention my biggest problem with the paper which is that it appears gpt4 grades the answers as well (see section 2.6 "Automatic Grading"). In a way it makes perfect sense that gpt4 can score 100% on a test gpt4 also grades. To be clear the grading gpt4 has the answers so it does have more information but it still might overlook important subtleties in how the real answer differs from the generated answer due to it's own failure to understand the material. |
|
Even this is overstating it, because for each question, GPT-4 is considered to get it "correct" if, across the (18?) trials with various prompts, it ever produces one single answer that GPT-4 then, for whatever reason, accepts. That's not getting "100%" on a test.