Hacker News new | ask | show | jobs
by throwusawayus 1482 days ago
creator of prefect was an early major airflow committer. anyone know what motivated the substantial rewrite of prefect? i had assumed original version of prefect was already supposed to fix some design issues in airflow?
2 comments

I'm a heavy Prefect user and was also very confused about the initial rewrite, even after reading several summaries. My best advice is to just try using 2.0 (Orion). Here's how I'd summarize the difference:

Prefect 1.0 feels like second-gen Airflow--less boilerplate, easy dynamic DAGs, better execution defaults, great local dev, etc etc. It's more sane but you still feel the impedance mismatch from working with an orchestrator.

Prefect 2.0 is a first-principles rewrite that removes most of the friction from interacting with an orchestrator in the first place. Finally, your code can breathe.

I think you mean prefect orion/v2[0]. I'm curious too.

[0] https://www.prefect.io/orion/

Yes, the original stack 'Prefect' was written to address issues in airflow. The DAG on prefect was built using decorators in a context which was pretty cool and worked well but they moved to DAG generation as code on Orion.

Prefect very cleanly written, good design and flexible. IMHO it is a platform that will be the next big thing in the area.

How I know, I deployed prefect as a static config gathering system across 4000 servers, both Linux and Windows. No other software stack came close, as one of the core concepts of prefect is 'expect to fail'. Things like Ansible Tower die really quick with large clusters due to the normal number of failures and the incorrect assumption that most things will work (as you can for a small cluster).

I wish I got to use it in my current work but there is no use case. Yet.

You mean you used prefect to fetch nodes "system parameters/config" ?

Interesting use case, I use prefect for data pipelines, never thought about that kind of use case.

I had many thousands of machines. I needed to collect disk size, ram, software inventory, some custom config, if present. Some machines are Linux, some windows.

With prefect I created a task 'collect machine details for windows', another 'collect machine details for Linux', another 'collect software inventory'.

I have a list of machines in a database so I create a task to get them. That task is an sqlalchemy query so I can pass the task a filter.

I get a list of linux machines and pass that to a task to run. I get a list of windows machines and pass that to a task.

Note that the above don't depend on each other.

I have a task that filters good results from bad. I have another task that writes a list to a database.

Other tasks have credentials.

Another task puts errors to an error table, the machines that failed get filtered from the results and run into this task.

I plumb the above up with a Prefect flow and it builds a DAG that runs the flow. Everything that can be run in parallel does so, everything that has some other input waits for the input.

Tasks that fail can be retried by Prefect automatically. Intermediate results cached. And, I get a nice gui for everything. I can even schedule it in the gui.

Very interesting ! Thank you for the details.