|
|
|
|
|
by tastroder
996 days ago
|
|
This is from 21, not really news, and the paper version on arxiv and published at NeurIPS have quite a few citations. No one's suppressing this, people that don't reflect on their datasets or how they use them just either don't care or fail to acknowledge they're actual issues. |
|
I note neither this paper nor any discussion of "BookCorpus" or even "book corpus" has appeared on HN previously.
Addressing "Documentation Debt" in Machine Learning Research: A Retrospective Datasheet for BookCorpus, 2021, Jack Bandy and Nicholas Vincent
https://arxiv.org/pdf/2105.05241.pdf?