Hacker News new | ask | show | jobs
by abuehrle 1863 days ago
This is really cool! I would have liked to have incorporated this into my vaccine appointment slot finder tool a few months ago. I like using git commits for change tracking too. Seems not dissimilar (though not identical) to what they're doing at Dolt (https://www.dolthub.com/).
1 comments

Yup, there's Dolt, and DVC, and probably a dozen other projects I'm forgetting or haven't heard of. Dat!

There's more than one way to data. We looked at a bunch of them, and the key thing we keep coming back to is git semantics. In many ways, all these other projects attempt to graft git semantics on top of more scalable datastores, allowing you to "fork" your data or roll it back to a given version. Trouble is, these abstractions have subtly different semantics or behaviors. These aren't inherently bad — just not the same as the ones you know from git.

This approach sacrifices "scalability" in order to let you Just Use Git™. It won't work (well) for a larger dataset, but we find that it's useful in a ton of situations.

For example: I have personally shipped bugs to production because my test fixtures had stale example data. I should have remembered to create new fixtures, but I didn't. Flat could have made them for me, on a schedule, subsampling and anonymizing production data as it worked.

It's a subtle difference in appplication. If your goal is to version $BIGDATA, then Flat isn't the right tool for the job, and you should check out Dolt, DVC &co.