Hacker News new | ask | show | jobs
by vedtopkar 1467 days ago
I'll echo that it's a tremendous amount of work to appropriately format data and metadata for upload to a public repository. It's not as simple as just mass uploading raw data files. It's no surprise that people skirt around this requirement when possible, especially as the size of files and number of samples per paper continues to balloon.

That's not to say that it's not important, but the labor required is vastly underappreciated. I say this as someone who has completed the task for many papers.

2 comments

Yeah this is the main reason, as anyone actually doing research knows. I don’t think civillians realise what an incredible amount of work it is to publish a paper.
yeah. holy hell is it a lot of work. hate doing it even when I know I'll get a paper out of it.
Hell, even things like journals requiring formatting after acceptance being taken as a godsend.
Is there some way to work this in to the system? Like by having a different person do this work? At a very high level, all of science would benefit from more data being properly formatted and published, so maybe this is something that could be budgeted for in the planning phase? Or the work could be done by undergraduates studying for the field?
The amount for a non-modular (i.e. the default, you don't have to have a super detailed budget) NIH R01, which is the basic unit of biomedical science funding, has been stagnant since 1999.

Budgeting for a separate person to do this would eat up a tremendous amount of said budget.

Which is fine, except if we think about this from a productivity standpoint, the person who budgeted for that person is probably short a graduate student compared to the person who didn't, on the outside chance that someone cares about their stuff. For example, I looked at my lab's publicly available repositories linked to papers on Github.

Watches: 15 (probably half of these are people involved in the project) Forks: 3 Stars: 4 Visitors in the Past Month: 4

That's it. For all of them. While I keep doing it based on principle, if I stopped tomorrow, it would impact me not at all.

And undergraduates, candidly, are not usually time savers.