Hacker News new | ask | show | jobs
by MontyCarloHall 1416 days ago
Dismissing Airflow for not being Astronomer is like dismissing Linux for not having the capabilities of a large-scale hypervisor.

Replace “Airflow” with “Linux,” “data engineers” with “systems programmers,” and “Astronomer” with your hypervisor of choice (Xen/VMWare/etc.), and you can see how absurd the author’s point is:

   My problem is that ~Airflow~ Linux was not designed to address [high-level systems architecture] problems. We don’t need a better [Linux], but we need a higher-level one: a system that enables ~data engineers~ systems programmers to think at a platform level.

   In fact, [Linux] is already displaced. [Linux] qua [Linux] is already obsolete, and it happened right within the [Linux] ecosystem. It’s called ~Astronomer~ Xen/VMWare/etc.

  If it sounds like you could simply replace [Linux] with basically any other ~job execution engine~ operating system, that’s because you could.
This is where the argument falls apart. Yes, for very large, complex deployments, higher-level orchestration is important, but the choice of low-level execution engine is also still hugely relevant, just as the choice of guest OS is still hugely relevant when discussing large deployments of VMs.

Furthermore, very few people actually need very large scale deployments; user experience and capabilities at the low-level are what most users actually care about.

3 comments

Managed Airflow doesn't even solve any of the author's outlined frustrations. It keeps the "obscene" syntax, it's still stateless, it's not "decentralized" etc.

Honestly, the article is so disingenuous that it comes off like a paid-for puff piece for Astronomer. It's the article-equivalent of the late-night infomercial guy who rips open a bag of potato chips like the hulk because he doesn't have this special tool that's just four easy payments of $9.99.

FYI, the infomercials with the strange tools fixing strange problems are usually focused on old or disabled people. Opening bag of chips with ridiculius tool sounds stupid, but it might help a stroke survivor or someone with one arm - but the sellers don't want to show those struggle on the screen to avoid humiliating people, so you see pefectly healthy looking young people spilling things like they have some neurodegenerative disorder or something. Because the target audience might.

Not saying infomercials people are angels, of course, but I wanted to sharethus somewhat nonobvious context.

(To stretch the metaphor, Airflow management system that gives everyone their own Airflow might be ridiculous but make sense for companies where cooperation is difficult :))

Interesting, Astronomer was actually my last choice for orchestrator. We went with Dagster, but I didn't want to make the takeaway "Dagster solves these problems", because it doesn't directly. Astronomer was just the best foil for the "meta-orchestrator" space that seems to be evolving, and which _can_ address these problems.
The new TaskFlow API has been part of AirFlow 2.0 since its release in 2020: https://airflow.apache.org/docs/apache-airflow/stable/tutori...
Agreed. The author is blaming Airflow for what are ultimately poor architecture decisions.

I will admit it's not easy to figure out best practices with Airflow, but if you make bad decisions and your system doesn't scale with the problem, you didn't understand the problem or how to solve it in the first place. The tools you chose are second to that.

You may not know very precisely the time constants you are dealing with in your problem until you give it a shot.
Honestly, we have to set up airflow at my job for some datalog collection and treatment. Which is fine, only i'm pretty sure we had exactly the same issue at my old job that we fixed in half a day, including testing and deployment, with a perl script. And i think in this particular instance (gitlab logs) it was treated with 90% Awk. Meanwhile my coworkers still have issues after almost a week (not all of this is on airflow, but still).

I'm not saying Airflow is bad (we did set up a lot of hadoop clusters and other apache products at my old job, and our clients used airflow a lot), but i think the evangelists are so good they push airflow for everything, and this is bad. OP did use airflow for something it was not really designed for, and it sucked, but i do have this impression that tech writers and apache evangelists deserve some of the blame.