| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by fwilliams 1570 days ago

If you look at the plagiarized language in the article, it seems as if the BM paper authors are claiming contributions (emphasis mine). Credit is a major currency in research, and it's important to give it where it is due. If someone did this with one of my papers, I'd be quite upset.

For example (Emphasis mine):

> The risks of data memorization, for example, the ability to extract sensitive data such as valid phone numbers and IRC usernames, are highlighted by Carlini et al. [41]. While their paper identifies 604 samples that GPT-2 emitted from its training set, we show that over 1 of the data most models emit is memorized training data. In computer vision, memorization of training data has been studied from various angles for both discriminative and generative models Deduplicating training data does not hurt perplexity: models trained on deduplicated datasets have no worse perplexity compared to baseline models trained on the original datasets. In some cases, deduplication reduces perplexity by up to 10%. Further, because recent LMs are typically limited to training for just a few epochs

1 comments

mywaifuismeta 1570 days ago

Yes, I agree that's bad but looks like sloppy copy and pasting as opposed to intentional plagiarism to claim contributions. Would it have been okay if they said "they" instead of "we"?

link

fwilliams 1570 days ago

Then who is "they" in this situation? You need a citation!

link