Hacker News new | ask | show | jobs
by fifilura 1279 days ago
I am curious, and I don't have the insights into these systems. But you first mention DAGs. And then you say rewrite to java or .NET.

When I think about DAGs today, i think about Airflow. And SQL.

Would it be a better match to rewrite these systems in SQL and Airflow? SQL for the logic and Airflow for the batch processing.

I know for many (particularly those who mention java and .NET) SQL is just a place where you fetch and store your data. But once you start building systems with it, you will soon realize it contains nothing more and - at the same time - nothing less than what you actually need for mangling your data in a terse way.

I know there are many reasons to frown on SQL for this, and I am fine with any comments about it. I think it can be a start of a good discussion. Nothing is black or white.

1 comments

So what I mean when I say DAG is just a shorthand of describing the architecture of the data flows. Like that's how it looks like when you draw up the jobs and how they depend on eachother. So I just took a linguistic shortcut, mainframes don't support objects such as graph datastructures anyway in an accessible and efficient way.

All the actual customer data is stored in databases so that's why SQL is needed. And then to access those databases you call various in-house cobol systems with the right parameters. SQL is good for the type of business logic that these systems do, for example niche cases like FATCA and CRS tax reporting and tons of legal details like that, to organize the business requirements it's embedded in the cobol programs as DB2. When you want to send out mails daily, monthly, quarterly, yearly, and so on, in ten different modes depending on a parameter, with different variants depending on user and organization data that's an orchestration done in mainframe OPC scheduler with various applications with jcl and proclib, and I think SQL is too strictly logical and can't do the fuzzy parts with strings and files and system communications to be economical as a full replacement. I'm pretty new so I don't understand most of the stuff honestly which is why I'm throwing out a word salad here.

We do have java and .NET parts that we communicate with, built around bought solutions that needed integration. Airflow would have to be integrated in the same way but can't be a full replacement

Thank you for your comment!

Handling the DAG architecture in a batched system is exactly what airflow does. Do this, wait here, when it is done do that. So the DAG acronym is entirely appropriate for this discussion.

And you can have tasks in Airflow that send out mails, quarterly that depends on the quarterly summary to be executed.

So my idea here is to

- Ingest all raw data into e.g. BigQuery. - Combine it the way you want with SQL - Add tables with email addresses etc for customer adaptations along with parameters - Join with those table to create custom adaptations. - Add output layer stuff (email, upload to custom file system etc).

Now you have one place and one language for your logic. Given that this part of the system is inherently batched that is. If it is real time/streaming it will not work.

The "can't do the fuzzy parts with strings and files and system communications" needs to be defined. This is the rot in the system, the ghost in the closet that everyone is afraid of approaching.