Hacker News new | ask | show | jobs
by Johnbot 665 days ago
An example of a Mossad generated file would be the source file plus a bunch of dead code. The dead code consists of lines from the original file repeated in random locations (plus, if you are using an "entropy file", random lines of code that were successful mutations from previous generations of Mossad).

As it turns out, a lot of student code can look this way anyway. Something crazy like 70% of authentic student code can have dead code in assignment submissions.

3 comments

> As it turns out, a lot of student code can look this way anyway. Something crazy like 70% of authentic student code can have dead code in assignment submissions.

Having assessed student code this does not surprise me. Source code control late at night for students, especially non-CS majors, tends to be variations of "append a number to the end of the function name" eg. sum1(x, y) sum2(x, y) ... sumTHISREALLYWORKS(x, y).

That said, if dead code was being used to hide plagiarism, which is something I had not considered before, then telling students they would be marked down for dead code would probably be enough to stop it.

I mean. Should be doing that anyway. Code doesn’t just exist for the computer, but also for humans who have to maintain it.
> I mean. Should be doing that anyway. Code doesn’t just exist for the computer, but also for humans who have to maintain it.

Harsh! I like to think I am good lecturer.

Depends on the specification of the assignment. In my case I teach data science not software development so the specification is not "bullet proof code that won't break when pytorch releases a new version tomorrow" but rather statistical and data rigour. This is where spent my time when marking, not how maintainable the code is.

CS students turn in MUCH better code, but frequently data is leaking into tests or validation sets etc. making the results either meaningless or compromised.

At the end of the day code quality is strongly correlated to grades.

That seems like readability is even more important! I've taught programming to friends and family my entire life (to anyone who wants to learn), and one thing I always focus on is 'telling a story with comments', explaining how, where, and why data flows through the code. At the end, reread your comments and your code and figure out which one is wrong; then refactor.
I'm surprised that large amounts of dead code is neither an obvious-to-machines nor an obvious-to-humans problem or demerit with submitted assignments -- regardless of plagiarism status. I'd especially have thought such a clunky approach should be caught be a decent plagiarism detection software. It makes me wonder if simply feeding a student's assignment into Claude would be more reliable these days by just asking it, "If you remove all the dead code, is the remaining code likely plagiarized?"
How would that pass in the user study? Did the people reviewing the code fail to see dead code scattered across random locations? Feels like it would be obvious as soon as you opened the file.
It would certainly depend to some degree on the complexity of the assignment. But it's also not that unusual for legitimate, non-plagiarized submissions to have dead code.
Sure, but is it not unusual to have "dead code consisting of lines from the original file repeated in random locations"? That would certainly stick out in any other environment (like a professional one).

I didn't study anything related to computers/software/programming in school, so I don't know what level is expected. But if I was tutoring someone and they handed me something with dead code in random locations in it, it would certainly catch my attention.

I think two things are at play here.

1. Students will frequently just try things until it works, move code around, etc., leading to very messy code. 2. Graders often do not look at individual assignments unless there is a reason to do so, often relying on automated test suites. And when they do look, I'd bet their first reaction is something like "I don't know why they're repeating themselves like this, but my rubric only penalizes them for 5 points here..."