|
No doubt its training data contains a lot of IBM manuals, probably even some commercial books on relevant topics, maybe even the contents of some of the forums you mention – and all that could be enough to correctly answer your questions. However, for languages like Python, Java, C, C++, JavaScript, Go, etc, it also contains untold millions of lines of code slurped from places like GitHub. Whereas, I really doubt it contains anywhere remotely near as much COBOL code, just because you look for COBOL code on GitHub public repos, you will find very little – the vast majority of COBOL code is in-house or vendor business software, and few seem to want to make that stuff public – and what COBOL code GitHub has is mostly toy exercises or ancient stuff, not examples of significant contemporary production code. The only way OpenAI is going to get a substantial quantity of that is if multiple private parties (such as banks) give them access to their COBOL code bases – not impossible, but absent some public info saying it has happened, it seems more likely it hasn't. I expect GPT-4 (or any LLM) is not going to perform as well on complicated programming tasks for COBOL compared to other languages. For more mainstream languages, it has millions of examples to help it do a better job, for COBOL it likely doesn't. |
Also, nobody needs to do complicated coffee talks with cobol, it wasn't meant for it. What we do need a lot of is translating cobol to python or Java.