Hacker News new | ask | show | jobs
by danenania 808 days ago
This looks interesting. I'm working on an OpenAI-based tool for coding tasks that are too complex for ChatGPT - https://github.com/plandex-ai/plandex

It's working quite well for me, but it definitely needs some time spent on benchmarking and ironing out edge cases.

I'm especially curious how it will do on more "obscure" languages. Not that Cobol is obscure exactly--I suppose there's probably quite a bit of it in GPT-4's training considering how pervasive it is in some domains. In any case, I'll try out this benchmark and see how it goes.

3 comments

> Not that Cobol is obscure exactly--I suppose there's probably quite a bit of it in GPT-4's training considering how pervasive it is in some domains

There is a huge amount of COBOL code in existence – but, almost all of it is non-public code used to run business and governments. Very little of it is publicly source-available (whether open source or something more restrictive than that)

Unless GPT-4's training data includes non-public code bases (I doubt it), it likely has rather little COBOL code in it

I've been using GPT4 to help me navigate a mainframe and a COBOL codebase and it knows far more than what my googling abilities manage to fish up in forums. It's actually surprisingly good at surprisingly deep mainframe topics.
No doubt its training data contains a lot of IBM manuals, probably even some commercial books on relevant topics, maybe even the contents of some of the forums you mention – and all that could be enough to correctly answer your questions.

However, for languages like Python, Java, C, C++, JavaScript, Go, etc, it also contains untold millions of lines of code slurped from places like GitHub. Whereas, I really doubt it contains anywhere remotely near as much COBOL code, just because you look for COBOL code on GitHub public repos, you will find very little – the vast majority of COBOL code is in-house or vendor business software, and few seem to want to make that stuff public – and what COBOL code GitHub has is mostly toy exercises or ancient stuff, not examples of significant contemporary production code. The only way OpenAI is going to get a substantial quantity of that is if multiple private parties (such as banks) give them access to their COBOL code bases – not impossible, but absent some public info saying it has happened, it seems more likely it hasn't.

I expect GPT-4 (or any LLM) is not going to perform as well on complicated programming tasks for COBOL compared to other languages. For more mainstream languages, it has millions of examples to help it do a better job, for COBOL it likely doesn't.

Look, nobody is going to perform as well on complicated programming talks using cobol as with python. But knowing everything you said, I was amazed at how good it was. Try it.

Also, nobody needs to do complicated coffee talks with cobol, it wasn't meant for it. What we do need a lot of is translating cobol to python or Java.

To rephrase my point: the gap between the best an LLM can do and the best an experienced human can do, is likely larger for COBOL than for more mainstream languages, simply because LLMs have a lot more opportunities to gain "experience" with those more mainstream languages than they do with COBOL.

What you are saying may well all be true, but it doesn't contradict what I'm saying.

Are you hiring for it? I don't know COBOL, but do know python and Java to some extent, and enjoy esoteric legacy problems
You can know learn Cobol online, see IBM's offering in coursera, e.g.: https://www.coursera.org/learn/cobol-programming-vscode
I'm only hiring locally in Israel right now
It even sucks at Guile Scheme, according to my experiences with GPT 3.5.
But it probably read all the books ever published on COBOL.
Almost certainly not. There are heaps of books in libraries which nobody has scanned yet, including many on COBOL. No LLM has read those.

Whether that makes a difference depends. If you are dealing with mainstream modern COBOL (like recent versions of MicroFocus or IBM Enterprise), it probably won't. If you are dealing with some obscure legacy COBOL dialect, odds are high there is some very helpful printed book which nobody has scanned.

> I'm especially curious how it will do on more "obscure" languages.

There’s definitely a lack of training data and ability (but unfortunately not confidence) in less widespread languages. It’s quite bad at pinescript, confusing versions of the language and producing unrunnable code, and being unable to correct it when given feedback

This looks great! Can’t wait to try it out today