Hacker News new | ask | show | jobs
by madhatter999 651 days ago
From the abstract:

“Consistent with prior work, our results show that access to GPT-4 significantly improves performance (48% improvement for GPT Base and 127% for GPT Tutor). However, we additionally find that when access is subsequently taken away, students actually perform worse than those who never had access (17% reduction for GPT Base).”

Kids who use ChatGPT do actually “significantly” better according to the authors. Now I don’t know if significantly means statistically significant here because I haven’t read the methodology but 127% increase in performance must be something. That said, that’s a clickbaity title if I’ve ever seen one.

Edit: Upon closer reading, the increase in performance is statistically significant. Also “access to GPT“ in this case is having GPT open while solving the problems, not studying with GPT and then solving the problems, which was my first understanding from the clickbaity title. Results are not terribly surprising in that regard.

5 comments

> Also “access to GPT“ in this case is having GPT open while solving the problems, not studying with GPT and then taking the test

If this is your takeaway you misread the paper. Students have access to GPT (if they have access, the control didn't) while working through practice problems. Not for the exam itself. From the paper in the experimental design section:

> Each session has three parts:

> 1. In the first part, teachers review a topic (e.g., combinatorics) previously covered in the course, and solve one or more examples on the board. This part is identical to a standard high school one-to-many (i.e., teacher-to-students) lecture.

> 2. The second part is an assisted practice period, where students solve a sequence of exercises designed by teachers to reinforce the covered concept. Our randomized intervention (described in more detail below) only affects this second, self-study part.

> 3. The third part is an unassisted evaluation, where students take a closed-book, closed laptop exam. Importantly, each problem in the exam corresponds to a conceptually very similar practice problem from the previous part—this design was chosen to help students practice the key concepts needed to perform well on the exam.

Students with GPT (either form) did better during the practice problem portion and then worse during the actual exam (without GPT access) than students in the control.

Thanks for the clarification. I did look at it pretty quickly initially. Students might be over-relying on GPTs, which means less studying, which means less useful retention in the exam
They don't do better as far as learning is concerned. Isn't this concerning?

> However, we additionally find that when access is subsequently taken away, students actually perform worse than those who never had access (17% reduction for GPT Base).

The abstract continues:

> That is, access to GPT-4 can harm educational outcomes. These negative learning effects are largely mitigated by the safeguards included in GPT Tutor. Our results suggest that students attempt to use GPT-4 as a "crutch" during practice problem sessions, and when successful, perform worse on their own. Thus, to maintain long-term productivity, we must be cautious when deploying generative AI to ensure humans continue to learn critical skills.

Relying on GPT while solving the problems must be inflating the grades (my use of the word inflate is intentional, because the evaluation does not represent the true knowledge of the student), which then results in lower retention in the long run.
People sophisticated in a field can ask Sonnet or 4o questions that amount to a different way of searching and sometimes even a better one. If you ask a question in a direct, probing, narrow way you can sometimes come out ahead.

Someone educated by the News Feed algorithm (which is what RLHF amounts to: reward for getting human to click) is going to be the worst kind of wrong: /r/ConfidentlyIncorrect.

+1 insightful

PS was there ever a blog at b7r6.net?

Not yet.

I’m still grinding out what disclosure is and isn’t responsible.

But bet your bum I’ve got a theme picked out.

I didn’t realize anyone cared who I was enough to know I held the domain.

I was just curious; liked your comment, checked your hn profile, visited the domain it listed (shrug)
> Kids who use ChatGPT do actually “significantly” better according to the authors

No, ChatGPT does significantly better than the kids who don't have access to ChatGPT.

Copy-pasting answers from ChatGPT isn't some amazing skill.

You can read the paper itself to get my useful information. The title and article is confusing.

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4895486