|
|
|
|
|
by perihelions
1850 days ago
|
|
I very briefly skimmed Reddit for examples. This one is /r/physics' 4th-most-upvoted post from this week, and hasn't been detected yet. It algorithmically substitutes random words for their synonyms (poorly); not the machine language translation I promised, but it's performing the same goal of resisting phrase matching. “We knew that the first direct image of a black hole would be groundbreaking,” said Kazuhiro Hada of the
“We knew that the first uninterrupted image of a black hole would be revolutionary,” says Kazuhiro Hada of the
Original is [0] and plagiarized is [1] (linked indirectly because the other URL is probably blacklisted on here, and possibly malicious).This one was the cleanest of several examples I found*. I think the technique is widespread and broadly successful, based on my anecdotal experience. It's easy to find a diversity of examples in smaller-sized Reddit subs, the ones with less paranoid moderation and spam AI settings. The machine translation examples are far harder to detect (to me); I'll update you if I discover one again in the future. The ones I found several years back appear to no longer exist. [0] https://www.jpl.nasa.gov/news/telescopes-unite-in-unpreceden... [1] https://old.reddit.com/r/Physics/comments/njbrec/data_from_1... * (Because I could reliably identify the original document, and because the edit of a direct quotation from a named individual is an air-tight example of fraud). |
|
Someone should try running the OP through machine translation into some other language and then back into English. I wonder if that would produce the effect you're describing. I might try this later if I remember!
Edit: I tried running the first few paragraphs through Google translate into German and back into English. I'll post the two version as replies to this comment. I also did this via Dutch (which unsurprisingly came back closer to English), Italian, and Russian.
It seems clear that you are right. The translations are good enough for blogspam and can be used to evade detection. For example,
https://www.google.com/search?q=%E2%80%9C%5BMercury+concentr...
doesn't find the original. Other sentences I tried do get picked up by Google as references to the OP, but this can be circumvented. For example, this sentence from the English->German->English text gets picked up correctly:
https://www.google.com/search?q=The+discovery+is+worrying+si...
But the corresponding sentence from the English->Russian->English text does not:
https://www.google.com/search?q=This+finding+is+worrying+bec...