Hacker News new | ask | show | jobs
by irrational 586 days ago
I was recently assigned to work on a huge legacy ColdFusion backend service. I was very surprised at how useful AI was with code. It was even better, in my experience, than I've seen with python, java, or typescript. The only explanation I can come up with is there is so much legacy ColdFusion code out there that was used to train Copilot and whatever AI jetbrains uses for code completion that this is one of the languages they are most suited to assist with.
4 comments

Perhaps it is the reverse: That ColdFusion training sources are limited, so it is more likely to converge on a homogenization?

While, causally, we usually think of a programming language as being one thing, but in reality a programming language generally only specifies a syntax. All of the other features of a language emerge from the people using them. And because of that, two different people can end up speaking two completely different languages even when sharing the same syntax.

This is especially apparent when you witness someone who is familiar with programming in language X, who then starts learning language Y. You'll notice, at least at first, they will still try to write their programs in language X using Y syntax, instead of embracing language Y in all its glory. Now, multiply that by the millions of developers who will touch code in a popular language like Python, Java, or Typescript and things end up all over the place.

So while you might have a lot more code to train on overall, you need a lot more code for the LLM to be able to discern the different dialects that emerge out of the additional variety. Quantity doesn't imply quality.

I wonder what a language designed as a target for LLM-generated code would look like? What semantics and syntax would help the LLM generate code that is more likely to be correct and maintainable by humans?
Perhaps something like Cobol? (Shudder.)
That's great, but a sample size of 1, and AI utility is also self-confirmation-biasing. If the AI stops providing useful output, you stop using it. It's like "what you're searching is always in the last place you look". After you recognize AI's limits, most people wouldn't keep trying to ask it to do things they've learned it can't do. But still, there's an area of things it does, and a (ok, fuzzy) boundary of its capabilities.

Basically, for any statement about AI helpfulness, you need to quantify how far it can help you. Depending on your personality, anything else is likely either always a success (if you have a positive outlook) or a failure (if you focus on the negative).

But where did these companies get the ColdFusion code for their training data? Since ColdFusion is an old language and used for backend services, how much ColdFusion code is open source and crawlable?
I'm definitely assuming that they don't limit their training data to what is open source and crawlable.
That's a good question. I presume there is some way to check github for how much code in each language is available on it.
similar experience with perl scripts being re-written into golang. Crazy good experience with Claude