| (full disclosure - pachyderm employee) Good question. It's funny how much collaboration is overlooked. And you're right - it's not obvious how a data store can enable collaboration. In the software engineering world, collaboration by means of git is so prevalent its like breathing air. There's no such thing today for data scientists! That's crazy! Because doing data science involves more variables than writing software alone. Pachyderm stores your data in a git-like manner. We store the deltas and version the data so that its consistently reproducible. We also give you some nice tools to run any code alongside your data. This enables some very basic workflows: 1 - You're trying to develop your analysis - so work on your code & lock your data 2 - You're trying to vet new data - develop and version your feature extraction and data together 3 - You're trying to work on some analysis w colleagues - fork the data + analysis to do your work ... then merge to make sure your work is compatible before deploying There are many more ... but hopefully that makes it a bit more concrete |