Hacker News new | ask | show | jobs
by nabdab 2406 days ago
We’re working on a jupyter notebook setup as well, but for us it’s only analytical notebooks and dashboards (through Voila). How have you tied in triggered dataflows and ETL?
2 comments

Yes. Well the trick is using Dask with a @retry decorator. You can listen for changes on api's and sql tables easily with a while loop. Scheduling is easy, triggering, retrying and notifying are the harder parts.
One potential approach is Apache Airflow.
The Airflow creators went on to make Prefect, I hear.
I have worked with Airflow extensively, but this is the first time I heard about Prefect, and mind blown! Looking at the docs, it seems like they have resolved most of the things we had to work hard around in Airflow -- I definitely need to look more deeply into it. Thank you!
Do you have a personal contact? You seem to also be interested in similar things as me. By the way if you are ok with a bit of custom code you can do a lot of what airflow/prefect do with dask.delayed
I'm a big fan of dask, though haven't used it in a deployment/scheduling contact.

I'd love to connect.