| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dusted 39 days ago

The generated code is fine, if it's a self-contained class of average size.. or below. But even with immense architecture, and constant supervision, it does not take long before it degenerates into "focused fixes", shortcuts, laziness and just outright cheating or lying.. So far, no amount of prompting has lead me beyond this.. It's paradoxical, how the model seems to reason about the correctness (or wrongness) of a proposed architecture and design, can write a plan that seems to take this into account, answer correctly to questions about the plan (even the ones meant to uncover the nuances that may be unclear), ask tons of clarifying questions and update both plan and spec docs correctly, and yet continue to act like a "ticket closer" who immediately puts on the biggest possible blinkers (horse blinkers) and deeply ignores all of it when building that same plan, referencing those same documents...

Attempting anything comprehensive with AI is the software development analogue to the Gell-Mann Amnesia effect..

I'm definitely thinking deeply now about how I'm approaching these tools going forward.. Yes, GPT5 is better at spitting out a fairly acceptable skeleton to a class when prompted hard enough, than I am, in one go.. but.. It will happily do things like write decent looking protobuf schemas and then go ahead and hide everything that takes the least amount of reasoning behind some binary blob nested deep enough that it'll get past even the most dedicated reviewer..

It's fairly good at a lot of the things that I don't find interesting to deal with, but it's also amazingly incompetent when it comes to even the most mundane kind of common sense.. It's so strongly steering towards text-book examples that it will happily put in three times the amount of code and handle multiple classes of actually impossible edge-cases and even use-cases that it was specifically asked NOT to add.. And it will defend it by "well, I added this because I can't know if someone is going to use the thing I just added.. well, if you hadn't added it, chances are indeed slimmer..

It's so good at answering questions and explaining what's there, and diving through call-paths, and yet, it drops the ball the moment it's going to actually do something beyond saving me from looking up how write some really annoying and uninteresting boilerplate..

The worst thing is how good it is at making things LOOK right, it will cover every single edge-case you throw at it, but not because of the design, not because it correctly argues why the architecture is inherently allowing such and such, or because the design and spec fleshes out that A goes to B and never the other way around, and as soon as it's time to make something, it will make sure B can go to A, especially, it seems, if allowing so prevents it from doing the right thing which is WHY those edge-cases were trivial, instead it will endlessly hack around them.. I've worked people like that too, so I don't know if I am really blaming the models or the training data..

But damn it's a tough spot..

I've had multiple situations where, after wasting hours of work, which I should have just spend doing it myself, the only thing I really wished was for the model to be sentient, and able to feel pain, and have a corporal body so I could drag it outside and beat it to a pulp. (I've never reached that level of frustration with an actual person, so that's something new they bring to the table..)