Hacker News new | ask | show | jobs
by TheRealPomax 1232 days ago
So, status quo then? This is already the case for educational software that's used to detect plagiarism. People get wrongly flagged, and then you'll have to plead your case.

But the times software like this finds actual problems vastly outnumbers of times it doesn't, and when you choice is between "passing kids/undergrads who cheat the system" and "the occasional arbitration", you go with the latter. Schools don't pay teachers anywhere near enough to not use these tools.

3 comments

Currently the false positive rate is far lower. E.g. I get 500-ish submissions over a school year then a 1% false positive rate would mean I'd falsely accuse 5 innocent students annually, which isn't acceptable at all - and a 9% FP rate is so high that's even not worth investigating; do you know of any grader who has the spare time to begin formal proceedings/extra reviews/investigation for 9% of their homework?

For plagiarism suspicions at least the verification is simple and quick (just take a look at the identified likely source, you can get a reasonable impression in minutes) - I can't even imagine what work would be required to properly verify ones flagged by this classifier..

I really wish they'd have provided their false positive rate over several lengths of document, rather than an overall estimate. Because if it dives after say, 1,500 words, that's a relevant piece of information for its use.

I'm pessimistic, given they chose not to do so.

> I can't even imagine what work would be required to properly verify ones flagged by this classifier.

Yet.

At the same time the classifier is improving, the generative models are improving. It’s a classic arms race and this equilibrium is not likely to shift much either way. We are talking about models that approximate human behavior with a high degree of accuracy, I think the goal would be to make them indistinguishable in any meaningful way.
Can you elaborate?

I don't think that this is something that can change through tech advances for the classifiers - in all cases the classifier is just flagging for investigation, it's not sufficient for any action. For plagiarism, appropriate evidence comes from a person comparing the submission with the possible source of plagiarism. For this one, the proper evidence would require getting confirmation that the student actually generated that data - e.g. identifying the exact tool and prompt that was used, or logs from the students' computer showing that this was done, or logs from the text generation service provider. All of those are quite tricky to get and perhaps even not possible.

Given the published true and false positive rates, it's clear that the true positives do not "vastly outnumber" false positives.
> This is already the case for educational software that's used to detect plagiarism. People get wrongly flagged, and then you'll have to plead your case.

How often is that the case though? A while since I've had to worry about it, but I thought plagiarism detection generally worked on the principle of looking for the majority of the content being literal matches with existing material out there with only a few small edits, which - unlike using some "AIish" turns of phrase a bot wrongly attributes to humans 9% of the time and correctly attributes to AI with a not much better success rate - is pretty hard to do accidentally.

A long time ago when I was a student, I would run my papers through Turnitin before submitting. The tool would sometimes mark my (completely original) work as high as mid 20% similarity.

As a result, I have taken out quotes and citations to appease it and not have to deal with the hassle.

I expect modern day students will resort to similar measures.

IIRC the marker got the same visualization that you used to take out quotes and citations that highlighted that the similar bits were in fact quotes and citations!

Maybe high school is a different matter, but I'm pretty sure even the most technophobic academic knows that jargon, terse definitions and the odd citation overlapping with stuff other people have written is going to make a similarity of at least 10% pretty much inevitable, especially when the purpose of the exercise is to show you understand the core material well enough to cite and paraphrase and compare it, not to generate novel academic insight or show you understood the field so well you didn't need to refer back to the source material. The people they were actually after were the ones that downloaded something off essaybank, removed a couple of paragraphs and rewrote the intro to match the given title and ended up with 80%+ similarity