Hacker News new | ask | show | jobs
by zaptheimpaler 45 days ago
I have several sources of data I want to fetch, retry, process periodically. Like exporting Claude chats into .md files that go to Obsidian, fetching Garmin data from the API and processing it for a custom tool, exporting replays for a game, maybe even running some browser automation to get bank CSVs. I have some ad-hoc python scripts for all of this but no central way to manage them, schedule, handle errors and retries, store the original data and processed versions, resume from the last point etc.. is a workflow engine useful for something like that?
2 comments

Agree with other response, look at Dagster for this.

If you want to roll your own, you build a dependency graph (a dict) of the functions you want to call, Python already has graphlib.TopologicalSorter built in that can do this for you. Throw in logging and the tenacity library for retries and you’re set.

Check out Airflow and Dagster.

I've used Dagster but I can't compare to airflow. But in terms of DX, I've found Dagster pretty easy to use. Instead of writing their own DSL, they have a python library that lets yo tag your pre-made methods as @ops and and string them together into a DAG.