Hacker News new | ask | show | jobs
by samblr 2137 days ago
Thank you for your reply.

I have used Talend in great detail 3 years ago but I didn't have the usecase of schema diff at the time. But for data diff you can easily define workflow. And have to admit these workflows are crazy powerful and even can help the data fix with any transformation required (nocode or code)

However, Im seeing the usecase for a light weight tool with visual aspect. I like this. But will this problem be big enough for VC investment is the question ? I see schema diff can be just a plugin in one of the existing database tools. And if you are getting into data diff - you got to see what those tools do too.

1 comments

> But will this problem be big enough for VC investment is the question?

That's a great question. Thinking about where problems arise in data pipelines, there are fundamentally two moving pieces: 1) Your data – you're continuously getting new data without a real ability to enforce your assumptions on its schema or shape. 2) Your code for ingestion and transformation that needs to evolve with the business and to adapt to changes in other parts of the infra.

Datafold's Diff tool currently mostly addresses #2. It can add value to any company that runs ETL pipelines but most impactful at large data engineering teams (similar story to CI or automated testing tools).

Regarding #1, wouldn't it be useful if we tracked ALL your datasets across time and alerted you on anomalies in those datasets? And I am not talking about rigid "unit" tests e.g. X <= value < Y, but actual stats-based anomaly detection, akin to what Uber does: https://eng.uber.com/monitoring-data-quality-at-scale/

So, with diff, we already compute and store detailed statistical profiles on every column in the table. Next, we are going to track those profiles across time.

Diff is just the first tool we've built to get a wedge into the workflows of high-velocity data teams and start adding value, but it's just the beginning of a more comprehensive and, hopefully, valuable product we aspire to deliver.

Much appreciate your response