Hacker News new | ask | show | jobs
by johnhenning 1275 days ago
Looking at the demographics of the study, they only had 47 total participants, 15% of which were professionals and 62% reporting less than 5 years experience coding (which I would imagine is an underestimate assuming some people exaggerated their actual experience level). So that means they only had 6-7 participants who worked in industry and generously 18 people with more than 5 years experience. They also don’t report the breakdown of how participants did by experience. One other factor they use to measure is if the participant has security experience, but their bar for that is whether they have taken a single security class.

Given all of this, I don’t think the paper’s conclusion is convincing at all given that they were evaluating this on a pool of participants that a majority were students with not much experience programming when these tools are sold for professional use. I would bet if the study had a more uniform distribution of experience levels, you would probably see the “bugginess” trend downwards with more experience. Participants with more years programming just have had more time to learn how to use new tools more effectively in their workflows.

I definitely tweaked my methods of using Copilot plenty over the past year or so to take advantage of its strengths and avoid weak its weaknesses.

1 comments

My concern is that students/novices are going to be using this, without the ability to double-check the output of the tool. It inspires overconfidence, looks okay at the surface level, and bugs go unnoticed. The younger generation using this as a crutch, treating their own creations as a black box, will not have an adequate feedback mechanism to learn from their mistakes. Code quality and performance will deteriorate over time. You, an expert, learned without this crutch. Your use-case is frankly uninteresting.

Amusingly, without careful curation, I'd predict that buggy code will tend to self-replicate and these tools that indiscriminately slurp public code will enter a death spiral because the novices outnumber the experts. It's only a matter of time before viruses are written to propagate through this garbage stream. http://www.underhanded-c.org/

I definitely agree with your point about it being used as a crutch. My criticism was more towards how the authors evaluated AI’s effect on writing secure code. I’m not saying they shouldn’t have student participants, but they should be fully representative across the skill demographics.

To me it’s comparable to a study where you make a general claim about driving ability with lane assist but then 2/3 of the participants only have their learner’s permits.

What is the current feedback mechanism and will they not use existing feedback mechanisms if available? Professionally someone should be there to enforce quality/mentor, but for students or hobbyists, even without AI assistants, they often don't have anyone to say "this is bad, this is best practise" except stackoverflow