Hey HN, I'm one of the creators of DAGsHub (https://DAGsHub.com). Data Science Pull Requests (DS PRs) expand Pull Requests (PRs) to include data, models, and experiments. The idea behind DS PRs is to automate the data science review process and enable Open Source Data Science.
Concretely this means:
- Reviewing, comparing, and commenting on your experiments (metrics, parameters, visualizations), in context
- Seeing what data and models have changed (not just code)
- Comparing and diffing notebooks
- After reviewing the DS PR, you can merge it in, which will merge code, data, and models all at once
There is a lot of work to be done, and many things to be improved. I really want to make this workflow as simple and effective for everyone, and your input would be greatly appreciated.
Great questions. The data and models are managed via DVC, and shown as an integrated part of the project. This means that you manage the data & model versions in Git, and the files themselves in your preferred storage.
You can currently connect an existing repository by entering its address (and the appropriate auth if necessary). We plan to add a tighter GitHub integration very soon!
Concretely this means:
- Reviewing, comparing, and commenting on your experiments (metrics, parameters, visualizations), in context
- Seeing what data and models have changed (not just code)
- Comparing and diffing notebooks
- After reviewing the DS PR, you can merge it in, which will merge code, data, and models all at once
Learning to use Data Science PRs is very straightforward, read more here: https://dagshub.com/docs/collaborating_on_dagshub/data_scien...
There is a lot of work to be done, and many things to be improved. I really want to make this workflow as simple and effective for everyone, and your input would be greatly appreciated.