Hacker News new | ask | show | jobs
by boscillator 3 hours ago
> the right fix is not "handle every malformed case." ... [LLMs] will still attempt to handle now impossible errors.

This is the number one code smell from LLMs and I don't know why they are so obsessed with it. In python, it often comes as `hasattr` checks on types that are defined to have that attribute, in a code base that is fully type-checked.

Why do they do that? Is it from pre-training or re-enforcement? If that latter, can the labs please fix this?

4 comments

Likely just that they err on the unnecessary error handling than missing error handling. They likely penalize runtime errors harshly in the training
I suspect it's mostly the training data. I am also on team "make illegal states unrepresentable". It may get talked about a lot on HN, but I'm still at the point that I'm surprised when I see a code base that I didn't write in the wild that does a really good job of it, either open source or at work. Most programmers still think in terms of picking up pieces and fixing errors at the point where the error message pops out rather than making it so the error can't happen and the data reflects that.

I say "mostly" because I think there's also a problem with AIs thinking this way in their current state. That last level of human understanding of a code base, where the human holistically understands the flow of those guarantees, is a challenge to give them right now. On the raw code level, this sort of thing often involves enough code to easily blow out their context window. Trying to summarize it in memories-style files has its own problems; just because there is text written down about the guarantees doesn't mean that the AI is going to get the right info out of it, any more than a human might from just reading the code. I won't say it's "impossible" to give an AI this understanding because I'm not sure it is, but it is a level of understanding of the code that even if you get them to have it, their practices tend to fight against it.

My own solution to this problem has largely been to give up on them getting this. I prompt a solution to the problem the way that most people do, then if I want to make bad illegal states unrepresentable I prompt the AI through the process of the necessary refactorings, unless it's so small that I just do it myself. Given a lot of code that uses maps/dicts and arrays and strings and ints, if you prompt it through making those more thoroughly typed, it's actually pretty good at it. I've not had a lot of luck getting good designs out of single prompts, even when I get detailed. Treating it as two separate tasks seems to work out well.

And watch the diffs on the types carefully; AI loves to sneak past a ".JustSetItAndIgnoreAllThePreAndPostConditions(string)" method. After all, I suspect there's plenty of training data of "types that are nicely structured to make error states unrepresentable and then a later maintainer came along and added a 'JustEffingDoIt' method that broke everything" in the field. One of the best defenses is to make sure that the type implementing these things is in its own file and you can easily look at all the methods it adds on those types and smack it when it does that. I've tried slathering warnings about not doing this and explaining the pre- and post-conditions being maintained in the docs but the change seems marginal.

It’s because it matches the patterns they are trained to follow. They don’t understand the code. They can’t reason about the actual logic flow. They can only work with patterns.
Sorry to say but the solution is to stop using python. The models are trained to code defensively assuming historically representative python codebases. The models trust the types a lot more in languages where the canonical historical examples trust the types because the language is constructed around that premise.