Good suggestion! From what we've heard from academic users, they'll want a DOI for a specific version, e.g., data from a particular paper or journal article. Any thing else we should watch out for?
It's problematic when data publisher != data user/paper writer. I'm not familiar enough with DOI minting and therefore don't know what issues DOI generation on large scales for miniscule changes in the data might bring. Ultimately, if I make data openly available the worst case is that every change to the data requires a new DOI as I don't know how many people have downloaded earlier versions and not published on those yet / don't care about my added cleaning (or think it's wrong). I haven't done it in a while, but github's collaboration with Zenodo results in a zip file hosted there. Obviously, that reduces the amount of DOIs created but it's not great. As soon as my code changes, and someone uses that version in a paper, they'll use the old DOI. Potentially resulting in not reproducible results. The same is true for data. On the researcher side, you may end up with 100s of DOIs, each with zero-few citations. Also not great. A happy medium might be to leave it to the data generator to create DOIs for set versions, and drop anyone trying to resolve the DOI on a landing page that provides links to that original version and any updates since (maybe indicating later releases that have DOIs attached separately). Certainly would make me as data user / supplier happy.
Thanks for describing the problem. That's really interesting. We can certainly aggregate counts of downloads and installs across versions in Quilt. I'll definitely look into providing DOIs within Quilt and see if it's something we can do.