|
|
|
|
|
by vishnurnair
638 days ago
|
|
Solutions for specific problems I mentioned do exist for niches. But none of them can solve it well for all niches, which is what I believe is necessary. What we need is for all datasets from scientific papers to be easily accessible and licensed like code. |
|
CERN and high-energy physics has _massive_ datasets. Making them all available on-line isn't practical.
Other researchers may have one or two files that they want to cite as part of a paper.
Healthcare research may have confidential data for which there are specific types of access control required.
I don't think GitHub would be financially sustainable or scalable if it was able to host millions of one-file repos, alongside repos that grow terabytes per day, alongside those that hold highly sensitive data.