Hacker News new | ask | show | jobs
by fifilura 1279 days ago
Thank you for your comment!

Handling the DAG architecture in a batched system is exactly what airflow does. Do this, wait here, when it is done do that. So the DAG acronym is entirely appropriate for this discussion.

And you can have tasks in Airflow that send out mails, quarterly that depends on the quarterly summary to be executed.

So my idea here is to

- Ingest all raw data into e.g. BigQuery. - Combine it the way you want with SQL - Add tables with email addresses etc for customer adaptations along with parameters - Join with those table to create custom adaptations. - Add output layer stuff (email, upload to custom file system etc).

Now you have one place and one language for your logic. Given that this part of the system is inherently batched that is. If it is real time/streaming it will not work.

The "can't do the fuzzy parts with strings and files and system communications" needs to be defined. This is the rot in the system, the ghost in the closet that everyone is afraid of approaching.