Hacker News new | ask | show | jobs
by torchous 3257 days ago
Any thoughts on adding DOIs? It's a complex subject wrt versioning, in particular (new DOI per version? How to keep track?). It would help tremendously with the academic community; for the bean counting.
2 comments

The package name + hash is an implicit DOI. What if we added web support for it so that users could https://quiltdata.com/packages/USER/PKG?doi=SOME_HASH ?
Yes, but it's not globally recognizable. that's why doi's are standardized through ISO https://www.doi.org/ Internally you could implement a DOI->HASH mapping, but a quilt hash isn't going to help in the reference list of a paper (if you're lucky you can copy'n'paste it. How do you know where to go? What happens if your package organization changes internally and so forth.
This is more than adequate. DOIs are simply redirects. It's up to the data owner to point the DOI at whatever resource contains the data. If a DOI is registered, it can be pointed to the quilt URL.

You can take it a step further and either integrate with a DOI provider or become one yourself and integrate the registration process within your api or create command line tools.

Good suggestion! From what we've heard from academic users, they'll want a DOI for a specific version, e.g., data from a particular paper or journal article. Any thing else we should watch out for?
It's problematic when data publisher != data user/paper writer. I'm not familiar enough with DOI minting and therefore don't know what issues DOI generation on large scales for miniscule changes in the data might bring. Ultimately, if I make data openly available the worst case is that every change to the data requires a new DOI as I don't know how many people have downloaded earlier versions and not published on those yet / don't care about my added cleaning (or think it's wrong). I haven't done it in a while, but github's collaboration with Zenodo results in a zip file hosted there. Obviously, that reduces the amount of DOIs created but it's not great. As soon as my code changes, and someone uses that version in a paper, they'll use the old DOI. Potentially resulting in not reproducible results. The same is true for data. On the researcher side, you may end up with 100s of DOIs, each with zero-few citations. Also not great. A happy medium might be to leave it to the data generator to create DOIs for set versions, and drop anyone trying to resolve the DOI on a landing page that provides links to that original version and any updates since (maybe indicating later releases that have DOIs attached separately). Certainly would make me as data user / supplier happy.
Thanks for describing the problem. That's really interesting. We can certainly aggregate counts of downloads and installs across versions in Quilt. I'll definitely look into providing DOIs within Quilt and see if it's something we can do.