| From the abstract: “Consistent with prior work, our results show that access to GPT-4 significantly improves performance (48% improvement for GPT Base and 127% for GPT Tutor). However, we additionally find that when access is subsequently taken away, students actually perform worse than those who never had access (17% reduction for GPT Base).” Kids who use ChatGPT do actually “significantly” better according to the authors. Now I don’t know if significantly means statistically significant here because I haven’t read the methodology but 127% increase in performance must be something. That said, that’s a clickbaity title if I’ve ever seen one. Edit: Upon closer reading, the increase in performance is statistically significant. Also “access to GPT“ in this case is having GPT open while solving the problems, not studying with GPT and then solving the problems, which was my first understanding from the clickbaity title. Results are not terribly surprising in that regard. |
If this is your takeaway you misread the paper. Students have access to GPT (if they have access, the control didn't) while working through practice problems. Not for the exam itself. From the paper in the experimental design section:
> Each session has three parts:
> 1. In the first part, teachers review a topic (e.g., combinatorics) previously covered in the course, and solve one or more examples on the board. This part is identical to a standard high school one-to-many (i.e., teacher-to-students) lecture.
> 2. The second part is an assisted practice period, where students solve a sequence of exercises designed by teachers to reinforce the covered concept. Our randomized intervention (described in more detail below) only affects this second, self-study part.
> 3. The third part is an unassisted evaluation, where students take a closed-book, closed laptop exam. Importantly, each problem in the exam corresponds to a conceptually very similar practice problem from the previous part—this design was chosen to help students practice the key concepts needed to perform well on the exam.
Students with GPT (either form) did better during the practice problem portion and then worse during the actual exam (without GPT access) than students in the control.