| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by chaxor 1094 days ago

I didn't exactly intend it to operate precisely the same way that git does, but rather to have extensions of git that unify the system into one easy to use version control for data and code.

In most projects today, the code is (or generates, anyway) the data. This is true for materials science in physics, neural networks, and creation of databases via ETL. So, it would make sense to remove the requirement of making users of some software to regenerate this data, which may take 2 months on a supercomputer. Downloading that would be much faster. You can put it on a university server, or AWS, but now the data is in some system that is not guaranteed to be there. In fact, it's almost guaranteed to *not* be there in a very short period of time (people move positions and lose their access to these servers constantly).

So the very obvious best solution is IPFS for distribution of the data, but it does need to be linked to the git repo somehow. Of course, the data may not be simple or textual and play well with simple text based diffs for version control, so using something like borg can solve the issue of both data privacy, if needed, and block based diffs.

So this isn't to suggest "just git everything", but rather to say, 'if there's a new version control system for data and code, it's probably added some improvements to fit, and this could be a direction that makes sense'.

So I was checking to see if it had gone that direction yet.