Hacker News new | ask | show | jobs
by jtbayly 4 days ago
A lot of "plagiarism" is not plagiarism. Feed stuff you wrote into those tools and it will call you a plagiarist every day because you wrote something similar to the person you learned it from.

I don't know about this case, but a lot of these kinds of cases truly are witch-hunts. It's not at all like the reproducibility crisis and faked data and images.

6 comments

The very few cases that result in sanctions are generally horrendously flagrant.

With another professor I caught a flagrant case in a student thesis and we faced attacks from the university administration because the student had a stellar transcript (also not the positive signal some might think). Punishment was almost inexistent.

It's difficult for me to imagine what it would take to get a doctoral thesis revoked.

> It's difficult for me to imagine what it would take to get a doctoral thesis revoked.

Personal grudges. Academia is full of them.

You need more than that. No university is going to revoke anything without very good reasons, they have too much to lose. Their first action is always to try to bury the case.
Different leadership.

If some in your experience erred on the side of leniency, then it stands to reason that others might err just as egregiously in the opposite direction.

In fact, your anecdote suggests erring is the norm. We should thus expect punishments to be inappropriate in one direction or another. An appropriate punishment seems rather unlikely.

I too enjoy creating bell curves from a single datum.
The anecdote is meant to illustrate not to substitute a full data set.

Universities and other similar large institutions usually err systematically in one direction - that which protects the institution.

No, that doesn't stand to reason at all.
>It's difficult for me to imagine what it would take to get a doctoral thesis revoked.

No respect for the plagiarist physicist, but an easy way to control what media representatives of scientific disciplines get to say publically, is to start out with what amounts to "academic compromat" (scientific fraud, plagiarism, ...).

Did this physicist / media star recently say something controversial?

I mean why did the system let him pass as a physicist, and why did it let him rise the media rank?

> Did this physicist / media star recently say something controversial?

Not really. This is the consequence of an investigation by some journalists about a decade ago, and an audit that lasted for almost 2 years.

> I mean why did the system let him pass as a physicist, and why did it let him rise the media rank?

He is a smooth talker and by all accounts good at vulgarisation. He does well in interviews and is easy to deal with for journalists. There’s always been controversies but media thrive on those.

> I don't know about this case,

They compiled a document with the source material side-by-side https://v42.arretsurimages.net/fichiers/documents/2024-08-02...

This goes well beyond accidentally triggering a plagiarism detector.

> Feed stuff you wrote into those tools and it will call you a plagiarist every day because you wrote something similar to the person you learned it from.

The examples in the article use very distinctive wording. One or two occurrences would be forgivable as coincidence or inspiration. An entire document full of examples points to something else.

It seems like that should be the case yet when I listen to any same group of people over a period of time, I often find that those unfamiliar with a concept or solution on day 1 end up repeating it as if it was their own a few weeks later. When I was younger I tended to assume there was an element of intentional theft, but I'm not sure it's natural and a prerequisite to educational acquisition that people can categorize original origin of ideas that may have bounced around them for a long time before they understood their significance.
The plagiarism in the document was more significant than that.

This wasn't a couple cases of the same words or word pairs being used.

Sure a common answer would be intentionally copying in the same sessions, less likely is intentionally copying via eidetic memory.. But how much of a spectrum could there be in the middle for memory that would result in repeating a "plagiarism" form months later, etc?

People say how obvious the parlor trick is when they look at a small model LLMs. Well, I've seen the same parlor trick in students who get good grades but seem weak at thought from fundamentals. It seems quite possible to me that in some examples we are now going after them because the environment changed. At much earlier points we did actually value the people who could recite even if somewhat brokenly because we lacked random order recital tools.

You’re wrong; academia has never accepted plagiarism of this magnitude. Enforcement is never perfect, but a doctorate is not an undergrad repeating verbally, it is not thoughtless writing. It’s a doctorate thesis for crying out loud, it has to be novel!
Crediting the origin of the idea is the whole point of citing sources. Learning something from someone doesn't mean the idea is yours now. It means that when you repeat that idea, you should cite the original source of the idea.

This is just how scholarship works. It's not needed in the kind of day to day most of us do, but when you're writing a thesis for a PhD, this stuff matters. You're making the argument that you're expanding the totality of human knowledge with your dissertation, and that requires strict source citing to separate your original scholarship from the sources that influenced it.

What are these tools? I often write about stuff on my blog and I know a lot of what I’m writing or thinking about are ideas someone else has come up with (and that I’ve read but not remembered or not read and come up with a poor version of) but bog standard LLM DeepResearch never picks up the things I want.

I imagine any tool that’s good at plagiarism detection would also kill it at this kind of literature research.

An example of something where it worked like this is that I had some ideas around how tribes evolve and so on and wrote them as I could think of them and ChatGPT was able to find that Darwin’s Cathedral had a far better synthesis of various much more rigorous takes on the subject.

> I often write about stuff on my blog and I know a lot of what I’m writing or thinking about are ideas someone else has come up with

These tools compare words, not idea. They would not detect someone copying concepts but coming up with their own words. I guess some specially fine-tuned LLMs could work but I am not aware of a company actually licensing those for plagiarism detection.

> I don't know about this case, but a lot of these kinds of cases truly are witch-hunts.

There have been a lot of plagiarism accusation in his books already. In this case there was an audit and the conclusions are clear. Whole paragraphs copied and pasted word for word without attribution, about a third of the document overall. If anything, this should have happened about 15 years ago.

Having seen plagiarism first hand, sometimes it exceedingly blatant. Like copying from a PDF that was produced via LaTeX — since LaTeX hyphenates words to split them across lines, if you end up keep-ing the hyphenation in, the te-xt reads like this.
I've seen way worse: a Word document submission that preserved the style and fonts of the sources the plagiarer stole from. As in, font "Calibri 14" only appeared in paragraphs nicked from a source entirely written in that font - and the adjoining paragraphs weren't even size 14!!!

Sadly, this idiot won an award before I was able to see their work, so they had the confusion of receiving an award, and THEN being told they were being spanked for unacceptable behavior. Since they were too stupid to hide the most blatant clues, they had a hard time comprehending this duality.