| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Arqu 2063 days ago

I'm super interested in this topic. Recently (and still ongoing) I started on hashing out how to diff large datasets and what that even means.

I would love to get an understanding of how the HN crowd sees diffing datasets should be (lets say >1GB in size).

Are you more interested in a "patch" quality diff of the data which is more machine tailored? Or is a change report/summary/highlights more interesting in that case?

Currently I'm leaning more towards the understanding/human consumption perspective which offers some interesting tradeoffs.

1 comments

nerdponx 2063 days ago

Both! I need to be able to handle merge conflicts for data, but I also need the machine to be able to apply the changes.

link

konda 2063 days ago

You should checkout dolthub.com. It's versioned DB tool that allows for diffs and merges.

link