|
|
|
|
|
by afandian
638 days ago
|
|
I think the diversification is a strength, honestly. CERN and high-energy physics has _massive_ datasets. Making them all available on-line isn't practical. Other researchers may have one or two files that they want to cite as part of a paper. Healthcare research may have confidential data for which there are specific types of access control required. I don't think GitHub would be financially sustainable or scalable if it was able to host millions of one-file repos, alongside repos that grow terabytes per day, alongside those that hold highly sensitive data. |
|
The usual solution is to make a skeleton repo with only partial or no code, the real substance being a README that explains what the project is and instructions on how to use it. GitHub is a social network as well as a code warehouse in a way, and this comes with benefits. The same system for stars, issues, user groups, permissions etc. extends across all projects regardless of whether the code/data is actually hosted on GitHub. Something like this for science could be of huge benefit.