Hacker News new | ask | show | jobs
by emeryberger 677 days ago
JPlag, like similar plagiarism detectors, is vulnerable to attack. We outline the attack in this paper and show its effectiveness against JPlag and another widely used plagiarism detector, Moss. Note that this was written in 2020, in the pre “CheatGPT” era!

https://arxiv.org/abs/2010.01700

Mossad: Defeating Software Plagiarism Detection

Breanna Devore-McDonald, Emery D. Berger

Automatic software plagiarism detection tools are widely used in educational settings to ensure that submitted work was not copied. These tools have grown in use together with the rise in enrollments in computer science programs and the widespread availability of code on-line. Educators rely on the robustness of plagiarism detection tools; the working assumption is that the effort required to evade detection is as high as that required to actually do the assigned work.

This paper shows this is not the case. It presents an entirely automatic program transformation approach, Mossad, that defeats popular software plagiarism detection tools. Mossad comprises a framework that couples techniques inspired by genetic programming with domain-specific knowledge to effectively undermine plagiarism detectors. Mossad is effective at defeating four plagiarism detectors, including Moss and JPlag. Mossad is both fast and effective: it can, in minutes, generate modified versions of programs that are likely to escape detection. More insidiously, because of its non-deterministic approach, Mossad can, from a single program, generate dozens of variants, which are classified as no more suspicious than legitimate assignments. A detailed study of Mossad across a corpus of real student assignments demonstrates its efficacy at evading detection. A user study shows that graduate student assistants consistently rate Mossad-generated code as just as readable as authentic student code. This work motivates the need for both research on more robust plagiarism detection tools and greater integration of naturally plagiarism-resistant methodologies like code review into computer science education.

4 comments

As someone that actually used JPlag as a university TA, I think if the students are smart enough to implement this, they're probably smart enough to do whatever assignment we've asked of them (unless there's a easy peasy program to do the transformation, but I don't think it's the case here).

The usage of the tool is basically a deterrent against a very low-hanging cheating fruit for students (some still tried and thought changing the variable names would help them...)

If you read the paper, you'll see that the attack is entirely feasible to implement by hand (we did this ourselves but do not report on it in the paper). It's a pretty mechanical process. A bit of trial and error will get the job done; it's a hell of a lot easier than most assignments.
Here's the relevant quote that speaks to your statement:

  Mossad thus defies the conventional wisdom that defeating plagiarism detection
  is difficult or requires significant programming ability. The techniques that
  underlie Mossad could be implemented manually, relying on only the most basic
  understanding of programming language principles, letting them evade detection
  by both plagiarism detectors and some degree of manual inspection...
The usual defense against these is to ask students to explain their submitted work. Randomly generated dead code would likely be even more difficult for the students to explain.

Though a counterargument to this would be that teachers don't have time to interview every student. If Mossad is so good that teachers can't pick out the objectively suspicious subset, they might need to subjectively pick a random sample with varying amount of personal bias involved.

Yup. I sort of independently discovered this mechanism, and not just for "cheating," but for group work. Didn't even have to go individual interviews.

It was simple, I let students work in groups to do coding stuff (it's an intro type of class with students of varying skill levels). I had them work on a project together all they wanted, letting them know that it would be turned in about a month or so before the end of the semester. I would review them and then, in class, they would INDIVIDUALLY be quizzed on their own teams project; down to e.g.

"You have a function blahblah, explain what it does. What would happen if I passed it X?"

Forces them to work together and sort of study together. Kind of puts a bit more pressure on the less knowledgable, but probably worth it.

> A user study shows that graduate student assistants consistently rate Mossad-generated code as just as readable as authentic student code.

Do you have any small examples on a program that was transformed/generated with Mossad that we could compare against the original? As far as I can tell, the paper just have a really tiny example function.

An example of a Mossad generated file would be the source file plus a bunch of dead code. The dead code consists of lines from the original file repeated in random locations (plus, if you are using an "entropy file", random lines of code that were successful mutations from previous generations of Mossad).

As it turns out, a lot of student code can look this way anyway. Something crazy like 70% of authentic student code can have dead code in assignment submissions.

> As it turns out, a lot of student code can look this way anyway. Something crazy like 70% of authentic student code can have dead code in assignment submissions.

Having assessed student code this does not surprise me. Source code control late at night for students, especially non-CS majors, tends to be variations of "append a number to the end of the function name" eg. sum1(x, y) sum2(x, y) ... sumTHISREALLYWORKS(x, y).

That said, if dead code was being used to hide plagiarism, which is something I had not considered before, then telling students they would be marked down for dead code would probably be enough to stop it.

I mean. Should be doing that anyway. Code doesn’t just exist for the computer, but also for humans who have to maintain it.
> I mean. Should be doing that anyway. Code doesn’t just exist for the computer, but also for humans who have to maintain it.

Harsh! I like to think I am good lecturer.

Depends on the specification of the assignment. In my case I teach data science not software development so the specification is not "bullet proof code that won't break when pytorch releases a new version tomorrow" but rather statistical and data rigour. This is where spent my time when marking, not how maintainable the code is.

CS students turn in MUCH better code, but frequently data is leaking into tests or validation sets etc. making the results either meaningless or compromised.

At the end of the day code quality is strongly correlated to grades.

That seems like readability is even more important! I've taught programming to friends and family my entire life (to anyone who wants to learn), and one thing I always focus on is 'telling a story with comments', explaining how, where, and why data flows through the code. At the end, reread your comments and your code and figure out which one is wrong; then refactor.
I'm surprised that large amounts of dead code is neither an obvious-to-machines nor an obvious-to-humans problem or demerit with submitted assignments -- regardless of plagiarism status. I'd especially have thought such a clunky approach should be caught be a decent plagiarism detection software. It makes me wonder if simply feeding a student's assignment into Claude would be more reliable these days by just asking it, "If you remove all the dead code, is the remaining code likely plagiarized?"
How would that pass in the user study? Did the people reviewing the code fail to see dead code scattered across random locations? Feels like it would be obvious as soon as you opened the file.
It would certainly depend to some degree on the complexity of the assignment. But it's also not that unusual for legitimate, non-plagiarized submissions to have dead code.
Sure, but is it not unusual to have "dead code consisting of lines from the original file repeated in random locations"? That would certainly stick out in any other environment (like a professional one).

I didn't study anything related to computers/software/programming in school, so I don't know what level is expected. But if I was tutoring someone and they handed me something with dead code in random locations in it, it would certainly catch my attention.

I think two things are at play here.

1. Students will frequently just try things until it works, move code around, etc., leading to very messy code. 2. Graders often do not look at individual assignments unless there is a reason to do so, often relying on automated test suites. And when they do look, I'd bet their first reaction is something like "I don't know why they're repeating themselves like this, but my rubric only penalizes them for 5 points here..."

In an educational setting the plagiarism tools are probably most wanted by lecturers, but least useful. Do they teach every individual differently? If not, then there is not much surprise, if elementary ideas are expressed in very similar ways. So some cases of very similar solutions are bound to happen, hopefully not throwing shadow without proof of plagiarism.