|
|
|
|
|
by Frummy
1283 days ago
|
|
So what I mean when I say DAG is just a shorthand of describing the architecture of the data flows. Like that's how it looks like when you draw up the jobs and how they depend on eachother. So I just took a linguistic shortcut, mainframes don't support objects such as graph datastructures anyway in an accessible and efficient way. All the actual customer data is stored in databases so that's why SQL is needed. And then to access those databases you call various in-house cobol systems with the right parameters. SQL is good for the type of business logic that these systems do, for example niche cases like FATCA and CRS tax reporting and tons of legal details like that, to organize the business requirements it's embedded in the cobol programs as DB2. When you want to send out mails daily, monthly, quarterly, yearly, and so on, in ten different modes depending on a parameter, with different variants depending on user and organization data that's an orchestration done in mainframe OPC scheduler with various applications with jcl and proclib, and I think SQL is too strictly logical and can't do the fuzzy parts with strings and files and system communications to be economical as a full replacement. I'm pretty new so I don't understand most of the stuff honestly which is why I'm throwing out a word salad here. We do have java and .NET parts that we communicate with, built around bought solutions that needed integration. Airflow would have to be integrated in the same way but can't be a full replacement |
|
Handling the DAG architecture in a batched system is exactly what airflow does. Do this, wait here, when it is done do that. So the DAG acronym is entirely appropriate for this discussion.
And you can have tasks in Airflow that send out mails, quarterly that depends on the quarterly summary to be executed.
So my idea here is to
- Ingest all raw data into e.g. BigQuery. - Combine it the way you want with SQL - Add tables with email addresses etc for customer adaptations along with parameters - Join with those table to create custom adaptations. - Add output layer stuff (email, upload to custom file system etc).
Now you have one place and one language for your logic. Given that this part of the system is inherently batched that is. If it is real time/streaming it will not work.
The "can't do the fuzzy parts with strings and files and system communications" needs to be defined. This is the rot in the system, the ghost in the closet that everyone is afraid of approaching.