Hacker News new | ask | show | jobs
by alexmcc81 1780 days ago
If you read the article the authors directly address this and use datasets of machine translated articles as controls.
1 comments

People who don't speak English natively could use machine translation, and people plagiarizing could use machine translation. How do they distinguish (if you don't mind saving me digging into the research)?
Phrases like AI and big data are already pretty well defined in almost every major machine translation set. You'd have to forcefully try to thesaurus your way through to make it do that 99% of the time.
But wouldn't the error rate be the same for both legitimate and plagiarized texts? How do these errors distinguish between the two cases?