Hacker News new | ask | show | jobs
by stoppingin 1279 days ago
I've been working in environments that use mainframe systems for a long time. I've read COBOL code to get an idea of what's happening downstream, but never actually written a line of COBOL myself. I've always wanted to know: In a modern context, does COBOL have any advantages for applications like transaction processing?

I understand that it's not ideal from a language point of view, but does it have any properties that make it a great fit for the job? Does COBOL on a mainframe have particular properties that make it good for concurrency, validation, etc?

2 comments

It basically is extremely reliable and deterministic. Mainframes just work and stay up indefinitely. If you want to do a tax report or send out scheduled mails on 5 000 000 accounts, cobol is the right tool for that sort of job. Just shuffling data in let's call it a DAG of batch jobs with sql and restart checkpoints sprinkled in. It fails due to human error in the code or a relied upon system, because bugs in the language itself are nonexistent. Updates to the code is mostly due to regulation and compliance and keeping the business running.

But mostly, it's just impossibly expensive to rewrite a humongous spaghetti of societally critical systems from cobol to something modern such as java or .NET.

I am curious, and I don't have the insights into these systems. But you first mention DAGs. And then you say rewrite to java or .NET.

When I think about DAGs today, i think about Airflow. And SQL.

Would it be a better match to rewrite these systems in SQL and Airflow? SQL for the logic and Airflow for the batch processing.

I know for many (particularly those who mention java and .NET) SQL is just a place where you fetch and store your data. But once you start building systems with it, you will soon realize it contains nothing more and - at the same time - nothing less than what you actually need for mangling your data in a terse way.

I know there are many reasons to frown on SQL for this, and I am fine with any comments about it. I think it can be a start of a good discussion. Nothing is black or white.

So what I mean when I say DAG is just a shorthand of describing the architecture of the data flows. Like that's how it looks like when you draw up the jobs and how they depend on eachother. So I just took a linguistic shortcut, mainframes don't support objects such as graph datastructures anyway in an accessible and efficient way.

All the actual customer data is stored in databases so that's why SQL is needed. And then to access those databases you call various in-house cobol systems with the right parameters. SQL is good for the type of business logic that these systems do, for example niche cases like FATCA and CRS tax reporting and tons of legal details like that, to organize the business requirements it's embedded in the cobol programs as DB2. When you want to send out mails daily, monthly, quarterly, yearly, and so on, in ten different modes depending on a parameter, with different variants depending on user and organization data that's an orchestration done in mainframe OPC scheduler with various applications with jcl and proclib, and I think SQL is too strictly logical and can't do the fuzzy parts with strings and files and system communications to be economical as a full replacement. I'm pretty new so I don't understand most of the stuff honestly which is why I'm throwing out a word salad here.

We do have java and .NET parts that we communicate with, built around bought solutions that needed integration. Airflow would have to be integrated in the same way but can't be a full replacement

Thank you for your comment!

Handling the DAG architecture in a batched system is exactly what airflow does. Do this, wait here, when it is done do that. So the DAG acronym is entirely appropriate for this discussion.

And you can have tasks in Airflow that send out mails, quarterly that depends on the quarterly summary to be executed.

So my idea here is to

- Ingest all raw data into e.g. BigQuery. - Combine it the way you want with SQL - Add tables with email addresses etc for customer adaptations along with parameters - Join with those table to create custom adaptations. - Add output layer stuff (email, upload to custom file system etc).

Now you have one place and one language for your logic. Given that this part of the system is inherently batched that is. If it is real time/streaming it will not work.

The "can't do the fuzzy parts with strings and files and system communications" needs to be defined. This is the rot in the system, the ghost in the closet that everyone is afraid of approaching.

There's a good discussion of this at https://medium.com/the-technical-archaeologist/is-cobol-hold....

It starts off by suggesting that it's all about how COBOL's superiority stems from having support for Binary Code Decimal as a language-level element, rather than having to be imported via a library (the overhead of which really starts to matter at the volume of transactions which COBOL is typically required to handle). But then broadens the discussion out to argue that the intrinsic shape of the COBOL environment "stack allocation, pointers, unions with no run-time cost of conversion between types, and no run-time dispatching or type inference" is fundamentally different from languages like Java or C#, and those differences provide provide performance benefits which cannot be easily obtained in those other languages.