Hacker News new | ask | show | jobs
by jm1271 1492 days ago
Thanks for this post! Naive question: why not "just use Great Expectations"? At first blush GE seems like it has a lot of what you need out of the box: checks definable in YAML, extensibility, and connectors to many major data sources.

Was there something you all found lacking there which made "roll your own" the right approach here?

1 comments

As a software engineer new to the data space, I am baffled by why people recommended great_expectations. It has a lot of questionable dependencies that inflate image sizes and lead to conflicts at scale. It is also a very ambitious project that fails to deliver on many fronts, including documentation and basic data quality checks. The complexity in writing your own checks is way too high. There’s a lot of very abstract concepts you have to understand before you can write a single line of code. If you think I’m wrong, stop now and go look at some of their code examples. You’re better of using python’s built-in unittest to run a query and then make assertions on the result as a task in your DAG