Hacker News new | ask | show | jobs
by Arqu 2063 days ago
I'm super interested in this topic. Recently (and still ongoing) I started on hashing out how to diff large datasets and what that even means.

I would love to get an understanding of how the HN crowd sees diffing datasets should be (lets say >1GB in size).

Are you more interested in a "patch" quality diff of the data which is more machine tailored? Or is a change report/summary/highlights more interesting in that case?

Currently I'm leaning more towards the understanding/human consumption perspective which offers some interesting tradeoffs.

1 comments

Both! I need to be able to handle merge conflicts for data, but I also need the machine to be able to apply the changes.
You should checkout dolthub.com. It's versioned DB tool that allows for diffs and merges.