Hacker News new | ask | show | jobs
by fwilliams 1524 days ago
To quote the article:

> But even putting aside the fact that claiming someone else's writing as one's own is wrong, the value in survey papers is in how they re-frame the field. A survey paper that just copies directly from the prior paper hasn't contributed anything new to the field that couldn't be obtained from a list of references.

Good survey papers can be important contributions in their own right (e.g. [1]). A good survey should contextualize works within a subject area with respect to each other and identify high level trends/ideas in that subject. These connections are not only useful for learning a topic, but also for positioning novel work or identifying under-researched areas to focus on.

If the authors felt that one of the papers they plagiarized concisely expressed what they wanted to say, they could simply quote and cite that work. Otherwise, it could be construed that the authors are claiming to be the ones drawing the conclusions they wrote. Moreover, from the article, the survey in question seems to be pretty egregiously plagiarizing, which deserves to be called out/shamed.

[1] https://arxiv.org/abs/2111.11426

1 comments

I disagree with this:

> But even putting aside the fact that claiming someone else's writing as one's own is wrong, the value in survey papers is in how they re-frame the field. A survey paper that just copies directly from the prior paper hasn't contributed anything new to the field that couldn't be obtained from a list of references.

Whether or not a survey paper is "good" is irrelevant here. Yes, a survey paper that just lists others papers may be a bad survey paper, but it does nothing wrong as long as it cites the original papers, which this does. A bad survey paper may not be published in a journal, that's what peer review is for, but there is nothing wrong with publishing it openly on the web.

And there is still value in aggregating other papers, even if it's just a list with description. That's the reason why these "awesome-XX" Github repos are so popular. Time to hunt them down?

If you look at the plagiarized language in the article, it seems as if the BM paper authors are claiming contributions (emphasis mine). Credit is a major currency in research, and it's important to give it where it is due. If someone did this with one of my papers, I'd be quite upset.

For example (Emphasis mine):

> The risks of data memorization, for example, the ability to extract sensitive data such as valid phone numbers and IRC usernames, are highlighted by Carlini et al. [41]. While their paper identifies 604 samples that GPT-2 emitted from its training set, we show that over 1 of the data most models emit is memorized training data. In computer vision, memorization of training data has been studied from various angles for both discriminative and generative models Deduplicating training data does not hurt perplexity: models trained on deduplicated datasets have no worse perplexity compared to baseline models trained on the original datasets. In some cases, deduplication reduces perplexity by up to 10%. Further, because recent LMs are typically limited to training for just a few epochs

Yes, I agree that's bad but looks like sloppy copy and pasting as opposed to intentional plagiarism to claim contributions. Would it have been okay if they said "they" instead of "we"?
Then who is "they" in this situation? You need a citation!