Hacker News new | ask | show | jobs
by skypanther 916 days ago
I only skimmed the article due to its length. But I didn't see anything like a comparison to other toolchains. Like, how is this better/different/worse than Airflow + dbt + Snowflake?
4 comments

It does feel like it was lengthened with GPT. There's so many essay-style "now we propose to show that...", it's uncanny.

> we argue that we can rethink the current state of data transformation pipelines (intro)

> In the blog we will cover:

> We show a way how you can combine the best of both worlds

> We will tackle its impact and explain more in the following development section.

edit: yeah it's ChatGPT:

> Parts of this text were adeptly generated by ChatGPT but enhanced by real humans.

I guess it's the future, turn a tweet into a 5-page essay with your AI, so your readers can summarize it back to tweet-length with their AI.

Hi,

Georg - one of the authors is here.

Indeed, we used LLMs/GPT4 for proof reading and enhancing the English language (we are not native speakers).

We were thinking about breaking up the content - but decided that one long post is a better fit.

> > We show a way how you can combine the best of both worlds

In fact, that sentence I wrote by hand : )

Snowflake isn't local, you have to pay for cloud ... Airflow is Airflow, complexity, steep learning curve, there is a whole industry trying to be as/more powerful than Airflow with less complexity and cleaner integration with modern things you might want to do like k8s, although of course Airflow is still super popular and powerful.

the Dagster folks have some comparisons, of course there are popular modern alternatives other than Dagster

https://dagster.io/blog/dagster-airflow

https://dagster.io/vs/dagster-vs-airflow

Far more helpful to me than these essays on the benefits of a particular paradigm is a simple, minimal example in a repo that I can dig into and explore.
It just seems like one choice, among many. The dagster-dbt library in particular seems like a slightly pointless wrapper around the dbt cli and the json artifacts it creates. I've been using airflow and astronomer-cosmos which is not perfect but I didn't enjoy using dagster last time I tried.
Interesting why? What were the pain points?