Hacker News new | ask | show | jobs
by mnw21cam 1349 days ago
https://pubmed.ncbi.nlm.nih.gov/27552985/ estimates that about one fifth of papers with supplementary Excel lists of genes contain mangled gene names. I remember talking about this problem back in 2003. The HGNC has been quietly going around changing the names of some of these genes to try and stop this from being a problem.
1 comments

Thanks for the pointer. Indeed, a more recent paper (cited below) estimates an even higher error rate (30.9%), but the fact that we are not talking of 0.001% tells me that excel is simply a non-starter for this kind of work. (Actually, this is just one of many reasons why I discourage my students from using excel for any dataset.)

Abeysooriya, Mandhri, Megan Soria, Mary Sravya Kasu, and Mark Ziemann. “Gene Name Errors: Lessons Not Learned.” PLoS Computational Biology 17, no. 7 (July 30, 2021): e1008984. https://doi.org/10.1371/journal.pcbi.1008984.