|
|
|
|
|
by kmike84
2253 days ago
|
|
We needed to solve a similar problem - version control & synchronize .json files from different machines (annotations for ML models). Writing a custom git merge driver was quite painless - a cmdline script (written in Python), which has task-specific logic on how to merge data from these .json files. Load these files, parse them, decide how to combine, detect unresolvable conflicts, etc. It seems one may need custom logic to merge structured data, there is not a single best solution. This could make creation of a generic tool harder. git is not a bad base technology for this. I'm not sure what other things are we missing (e.g. better diffs for structured data?), because .json is still text; it is just merges which are unreliable if you treat .json as text. There are also caveats - e.g. you can't install a custom merge driver on github, so "merge" button becomes dangerous. But overall for .json this approach works fine. |
|
It operates on data at the same level as git but with features needed for large datasets and is totally language and framework agnostic like git.
[1]: https://dvc.org/