Hacker News new | ask | show | jobs
by WorldMaker 44 days ago
I tried to capture some of my feelings on this on a recent personal blog post/rant. The easiest phrase is that LLMs are "legacy code as a service". They are trained on other people's legacy code. (No one is intentionally feeding LLMs their best proprietary code.) They produce output that is "Day 1 Legacy Code" in the sense that there's no human code owner to take responsibility and you might be able to ask the LLM that built it questions, but it is easier to accept is as the LLM that wrote it is no longer at the company (between context/memory limitations and regular model upgrades/retrainings, etc).

But also, yeah, it starts to get worse than classic legacy code because you could try to build a theory of mind about the legacy code author(s). There were skills in trying to "mind read" a past generation. To find clues in poetry words more than the poetry form. (The variable names and whatever comments may have survived including commit logs; things written for humans to help explain the whys/hows, not just the whats.)

1 comments

"legacy code as a service" - that's apt. But would they be better if they trained exclusively on 'good code'? I know I don't know the answer to that question and I get the feeling that few people actually understand how they work enough to feel comfortable with asserting that to be true.
Yeah, I still wouldn't trust them if they were training on more good code, either. I think I understand enough of how they work to believe that even given plenty of good code they won't be able to learn the parts that make good code truly good. That's where I start into poetry metaphors and that the best code is not just concerned with poetry forms (the rhythm and meter required by the language) nor the literal meaning (the compiler output) but also the human elements of the poem such as the creative storytelling and multiple levels of metaphors. I cannot see the current technology getting good at those human parts of the poetry, no matter how good they get at the literal and the form.
The problem there is the _large_ language model part, the density and the reinforcement of the weights. There's far less good code in the world. ;) These things emit code as well as I do, such as they do, only because they've inhaled essentially the totality of "code in general", not artisanal code.