| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by drapado 676 days ago
	I recently had to check code from some of my students at the university as I suspected plagiarism. I discovered JPlag which works like a charm and generates nice reports

4 comments

beeboobaa3 676 days ago

Next time just ask them a few questions about the programming choices they made. Far easier.

link

BossingAround 676 days ago

How do you deal with disputes? One's code is flagged even if the student in question didn't actually cheat. What then? Do you trust tools over the students' word?

In addition, do things like stack overflow and using LLM-generated code count as cheating? Because that is horrible in and of itself, though a separate concern.

link

ActualTeacher 676 days ago

The output of plagiarism tools should only serve as a hint to look at a pair of solutions more closely. All judgement should be derived entirely from similarities between solutions and not some artificial similarity score computed by some program.

link

ziddoap 676 days ago

Unfortunately, this is not really what happens in my experience. The output of plagiarism tools is taken as fact (especially at high school levels). Without extraordinary evidence of the tool being incorrect, students have no recourse, even if they could sit and explain the thought process behind every word/line of code/whatever.

Lousy high school.

Indeed, this is exactly what I did.

link

ablob 676 days ago

If you talk about the written code to the student in question it should become clear whether it was copied or not.

link

drapado 676 days ago

Well, in this case I noticed the same code copied while grading a project. I used then JPlag to run an automatic check in all the submissions for all the projects. It found many instances where a couple of students did a copy-paste with same variable names, comments, etc. It was quite obvious if you look in detail, and JPlag helped us spot it in multiple files easily.

*edited mobile typos

link

thi341 676 days ago

An archival video of all coding sessions (locally, hosted by the student), starting with a visible outline of pseudo-code and ending with debugging should be sufficient.

In case of a false positive from a faulty detector this is extraordinary evidence.

link

wildzzz 676 days ago

We had a professor require us to use git as a timestamped log of our progress. Of course you could fake it but stealing work and basically redoing it piece by piece with fake timestamps is a lot of work for cheaters.

link

jaimex2 676 days ago

Kinda rare these days with ChatGPT

link

i_am_proteus 676 days ago

You might be surprised. Many students who use ChatGPT for assignments end up turning in code identical (or nearly identical) to other students who use ChatGPT.

link

hatmatrix 676 days ago

Surprising because you get different answers each time you ask ChatGPT.

link

mmcwilliams 676 days ago

Different in an exact string match but code that is copied and pasted from ChatGPT has a lot of similarities in the way that it is (over) commented. I've seen a lot of Python where the student who "authored" it cannot tell me how a method works or why it was implemented despite having the comments prefixed to every line in the file.

link

elashri 676 days ago

> (over) commented

From my experience using ChatGPT, It usually remove most of my already written comments when I ask questions about code I wrote myself. It usually give you outline comments. So unless you are supporter of the self documented code idea, I don't think ChatGPT over comments.

link

mmcwilliams 674 days ago

It's obviously down to taste, but what I've seen over and over is a comment per line which to me is excessive outside it being requested of absolute beginners.

That happens and also the model can't decide if it wants the comment on the line before the code or if everything should be appended to the line itself so when I see both styles within a single project it's another signal. People generally have a style that they stick with.

link

Joker_vD 676 days ago

Ah yes, good old "Did you even read the essay before handing it in? Next time, please do."

link

pdntspa 676 days ago

ChatGPT answers don't differ that much without being prompted to do so

link

serf 676 days ago

yeah but the prompt itself generally adds sufficient randomness to avoid the same verbatim answer each time.

as an example just go ask it to write any sufficiently average function. use different names and phrases for what the function should do; you'll generally get a different flavor of answer each time, even if the functions all output the same thing.

sometimes the prompt even forces the thing to output the most naive implementation possible due to the ordering or perceived priority of things within the requesting prompt.

it's fun to use as a tool to nudge it into what you want once you get the hang of the preconceptions it falls into.

link

franga2000 676 days ago

MOSS seems to be pretty good finding multiple people using LLM-generated code and flagging them as copies of each other. I imagine it would also be a good idea to throw the assignment text into the few most popular LLMs and feed that in as well, but I don't know of anyone who has tried this.

link

emeryberger 675 days ago

FWIW the attack we describe in the paper works against MOSS, too (that was the original inspiration for the name, “Mossad”).

link