Hacker News new | ask | show | jobs
by _dwt 121 days ago
This (LOC is an anti-metric, Goodhart's Law, etc.) is true, but I'm reaching the point of "fuck nuance" when I see so many articles superficially critical of AI which contain things like this:

> If AI-generated code introduces defects at a higher rate, you need more review, not less AI.

I think that is very much up for debate despite being so frequently asserted without evidence! This strikes me as the same argument as we see about self-driving cars: they don't have to be perfect, because there is (or we can regulate that there must be) a human in the loop. However, we have research and (sometimes fatal) experience from other fields (aviation comes to mind) about "automation complacency" - the human mind just seems to resist thoroughly scrutinizing automation which is usually right.

1 comments

I don't disagree with you entirely here. I probably wasn't clear enough on what I was trying to convey.

Right now AI / Agentic coding doesn't seem is a train we are going to be able to stop; and at the end of the day is tool like any other. Most of what seems to be happening is people let AI fully take the wheel not enough specs, not enough testing, not enough direction.

I keep experiment and tweaking how much direction to give AI in order to product less fuckery and more productive code.

Sorry for coming off combative - I'm mostly fatigued from "criti-hype" pieces we've been deluged with the last week. For what it's worth I think you're right about the inevitability but I also think it's worth pushing a bit against the pre-emptive shaping of the Overton window. I appreciate the comment.

I don't know how to encourage the kind of review that AI code generation seems to require. Historically we've been able to rely on the fact that (bluntly) programming is "g-loaded": smart programmers probably wrote better code, with clearer comments, formatted better, and documented better. Now, results that look great are a prompt away in each category, which breaks some subconscious indicators reviewers pick up on.

I also think that there is probably a sweet spot for automation that does one or two simple things and fails noisily outside the confidence zone (aviation metaphor: an autopilot that holds heading and barometric altitude and beeps loudly and shakes the stick when it can't maintain those conditions), and a sweet spot for "perfect" automation (aviation metaphor: uh, a drone that autonomously flies from point A to point B using GPS, radar, LIDAR, etc...?). In between I'm afraid there be dragons.

@_dwt don't worry you didn't I appreciate good discussion and criticism. The publication is new and I'm still trying to calibrate my voice and style for it.

>I don't know how to encourage the kind of review that AI code generation seems to require. Historically we've been able to rely on the fact that (bluntly) programming is "g-loaded": smart programmers probably wrote better code, with clearer comments, formatted better, and documented better. Now, results that look great are a prompt away in each category, which breaks some subconscious indicators reviewers pick up on.

I don't anyone knows for sure, we all are on the same boat trying to figure it out how to best work with AI; the pace of change is making it so incredibly difficult to keep or try things. I'm trying a bunch of stuff at the same time:

-https://structpr.dev/ - to try to rethink how we approach PR reading, organizing review (dog-fooding it right now so is mostly alpha)

- I have an article schedule next week talking about StrongDMs Software factory, there are some interesting ideas there like test holdouts

- Some experiments in the Elixir stack for code generation and verification that go beyond it looks great. AI can definetively create code that _looks_ great but there is plenty of research that shows a lot of AI generated code and test can have a high degree of false confidence.