Hacker News new | ask | show | jobs
by mercantile 2181 days ago
If you don't know COBOL, Wikipedia has a pretty excellent overview of a lot of its basic syntax here:

https://en.wikipedia.org/wiki/COBOL#Features

It's very verbose, and includes a lot of opaque language that almost seems designed to confuse (I know it wasn't actually, but it's pretty bad). For example, an alphanumeric variable type is called a `PIC` for "PICTURE". You can store an alphanumeric as a `PIC A`, or a numeric-only using a `PIC X`. Fields in a record get a "level number" that defines their behavior, and you just have to memorize which does what. 01 is a top-level record. 05 defines a subgroup. 88 is a conditional record.

There are many parts of the language that are clearly anti-features in retrospect. A 66 level number allows you to redefine a field in a previously defined record. Convenient in some cases maybe, but something that will clearly lead to maintainability problems as the shape of a record doesn't match its original definition in code. Another example is that COBOL has a huge vocabulary, with the original idea that you could write things as much like English as possible (e.g. use a `GREATER THAN` instead of a `>`), which is one of those things that probably seemed like a good idea at one point, but which every modern language has abandoned or is abandoning (through the use of linters, etc.).

The article makes quite a few salient points, but on the other hand, if there's ever been a language that got almost everything wrong in about as objective a sense as you can get when speaking about these things, it's this one.

Instead of inspiring the next generation of COBOL programmers, IMO there's a good alternative argument to be made for getting a bunch of smart people together to write a transpiler that could transform huge legacy COBOL codebases to something more maintainable, like how the Go team transpiled from C to Go, and vanquishing this language to the history books. Obviously very difficult, but something that'd pay off in the long run.

3 comments

It helps to understand COBOL in the context of its environment: punchcards containing a single line of code, hierarchical (e.g. IMS) and network (e.g. IDMS) databases, the intended application being business data batch processing on mainframes, etc. In this context those field and record specifiers (etc.) make more sense.
Sorry to criticize, but as a daily maintainer/writer of COBOL, your characterization is a bit off. Besides technical inaccuracies (PIC 9 is for numerics, 88 is a conditional value in a field) you don't seem to grasp what's truly right or wrong with COBOL. Nor does the article. For example, ime no-one uses 66 levels, as the REDEFINES clause achieves the same result. While there _can_ be issues with maintaining code with overlapping definitions, _their use is deliberate_. For example, data read from a file could have two or more different record structures. It is read into one place in memory and the redefinition applies different formats/views to the same memory. The correct one is used as needed. There is no problem. You might be right that there are some anti-features, but like GOTO they simply aren't used in real codebases.

The real problem with COBOL imo, is that the language hasn't much improved since it's creation. While there have been some useful tweaks and changes, it has missed out on major items such as variable scoping/user-defined functions (all variables are global - the workaround is to use sub-programs and pass data in the call). Even "object-oriented" functionality as mentioned in the article was badly tacked on to the language and only complicates it without really adding much in terms of capability. Also, the language doesn't really have strings as in other languages, and consequently lacks the applicable functions.

Another major point that the article misses is that mainframe COBOL (most COBOL is/was mainframe and that means IBM) which does anything more than straightforward "read file,process contents,generate report" is inextricably linked to its environment. The article talks about poor form handling. COBOL doesn't do form handling. That would be CICS or IMS/DC software which the COBOL interfaces with. There is heavy integration in the code, but this is like confusing Java with Websphere or Tomcat. This is why it is so difficult to port COBOL systems to another language/environment. The COBOL code can be mechanically converted. But any complex system has calls to databases, transaction processing, and is built with batch jobs relying on JCL and system utilities. Replacing the environment while retaining the integration and reproducing the functionality is the hard part.

> COBOL doesn't do form handling. That would be CICS or IMS/DC software which the COBOL interfaces with

The COBOL 2002 standard includes form handling ("SCREEN SECTION"). However, IBM COBOL implementations never included that, and so you are correct that on IBM mainframes stuff like CICS and IMS/DC is used instead. Some COBOL compilers on other platforms did implement "SCREEN SECTION", e.g. Micro Focus COBOL on Windows/Unix, DEC COBOL on OpenVMS, Tandem NonStop COBOL, and COBOL software developed for those platforms did use it. It is also true that non-IBM mainframe COBOL is likely a minority of all COBOL software still in use.

GNU COBOL also implements it, and I have to admit that it's kind of fun to play around with.
It's actually not that complicated. It's just different and, coming from traditional languages, strange.

Some technical mistakes in your post was already raised by another reply, and I just want to point out how the levels actually work. I've played around with COBOL on and off out of casual interest, but when the Oreilly COBOL book became available to me recently I decided to read it just for fun.

As it turns out, COBOL mixes the concept of variables and report definitions. To paraphrase an example from the book[1], which I can recommend:

    01 date-of-birth.
      02 year.
        03 century pic 99.
        03 year-in-century pic 99.
      02 filler pic x value "-".
      02 month pic 99.
      02 filler pic x value "-".
      02 day pic 99.
With that definition, you can read and write values to and from each individual field, but also access the higher level ones as a combined field. For example, if I assign a full ISO 8601 date to the variable date-of-birth, like so:

    move "2010-01-02" to date-of-birth
I can then read the year by simply reading the corresponding variable:

    display "year is ", year
It all becomes much more clear once one realises that variable definitions in COBOL are conceptually field definitions for reports and fixed column storage formats. If the task at hand matches the way COBOL thinks of data, things are pretty simple. Once you want to move outside that realm things becomes harder.

My understanding is that in the mainframe world, COBOL programs are parts of much larger workflows, all coordinated by JCL scripts. Since all data is already in fixed column form, typing together these COBOL programs is similar to how you build pipes out of different programs when you write shellscripts in Unix.

[1] https://learning.oreilly.com/library/view/beginning-cobol-fo...