Hacker News new | ask | show | jobs
by yojo 67 days ago
+1

I’ve been driving Claude as my primary coding interface the last three months at my job. Other than a different domain, I feel like I could have written this exact article.

The project I’m on started as a vibe-coded prototype that quickly got promoted to a production service we sell.

I’ve had to build the mental model after the fact, while refactoring and ripping out large chunks of nonsense or dead code.

But the product wouldn’t exist without that quick and dirty prototype, and I can use Claude as a goddamned chainsaw to clean up.

On Friday, I finally added a type checker pre-commit hook and fixed the 90 existing errors (properly, no type ignores) in ~2 hours. I tried full-agentic first, and it failed miserably, then I went through error by error with Claude, we tightened up some exiting types, fixed some clunky abstractions, and got a nice, clean result.

AI-assisted coding is amazing, but IMO for production code there’s no substitute for human review and guidance.

4 comments

My process: start ideating and get the AI to poke holes in your reasoning, your vision, scalability, etc. do this for a few days while taking breaks. This is all contained in one Md file with mermaid diagrams and sections.

Then use ideation to architect, dive into details and tell the AI exactly what your choices are, how certain methods should be called, how logging and observability should be setup, what language to use, type checking, coding style (configure ruthless linting and formatting before you write a single line of code), what testing methodology, framework, unit, integration, e2e. Database, changes you will handle migrations, as much as possible so the AI is as confined as possible to how you would do it.

Then, create a plan file, have it manage it like a task list, and implement in parts, before starting it needs to present you a plan, in it you will notice it will make mistakes, misunderstand some things that you may me didn’t clarify before, or it will just forget. You add to AGENTS.md or whatever, make changes to the ai’s plan, tell it to update the plan.md and when satisfied, proceed.

After done, review the code. You will notice there is always something to fix. Hardcoded variables, a sql migration with seed data that should actually not be a migration, just generally crazy stuff.

The worst is that the AI is always very loose on requirements. You will notice all its fields are nullable, records have little to no validation, you report an error when testing and it tried to solve it with an brittle async solution, like LISTEN/NOTIFY or a callback instead of doing the architecturally correct solution. Things that at scale are hell to debug, especially if you did not write the code.

If you do this and iterate you will gradually end up with a solid harness and you will need to review less.

Then port it to other projects.

> After done, review the code. You will notice there is always something to fix. Hardcoded variables, a sql migration with seed data that should actually not be a migration, just generally crazy stuff. > > The worst is that the AI is always very loose on requirements. You will notice all its fields are nullable, records have little to no validation, you report an error when testing and it tried to solve it with an brittle async solution, like LISTEN/NOTIFY or a callback instead of doing the architecturally correct solution. Things that at scale are hell to debug, especially if you did not write the code.

For that I usually get it reviewed by LLMs first, before reviewing it myself.

Same model, but clean session, different models from different providers. And multiple (at least 2) automated rounds of review -> triage by the implementing session -> addressing + reasons for deferring / ignoring deferred / ignored feedbacks -> review -> triage by the implementing session -> …

Works wonders.

Committing the initial spec / plan also helps the reviewers compare the actual implementation to what was planned. Didn’t expect it, but it’s worked nicely.

LISTEN/NOTIFY is not brittle, we use it for millions of events per day.
I agree! It should be very stable, IMO. If not, then please send a bug report and we'll look into it. Also, now it scales well with the number of listening connections (given clients listen on unique channel names): https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit...
The LISTEN/NOTIFY feature really just doesn’t get enough PR. It is perfectly suitable for production workloads yet people still want to reach for more complicated solutions they don’t need.
It's not the feature itself, it's how/what the llm tries to use it for. It uses it to cross any and all architectural boundaries.
I find it very interesting that you assume this method would branch out to other projects. I find it even more interesting that you assume all software codebases use a database, give a damn about async anything, and that these ideas percolate out to general software engineering.

Sounds like a solid way to make crud web apps though.

GP is clearly providing examples of categories of tasks. Sure, not all languages do “async fn foo()”, but almost all problem domains involve some sort of making sure the right things happen at the right times, which is in a similar ballpark.

Holier than thou “yeah well I work on stuff that doesn’t use databases, checkmate!” doesn’t really land - data still gets moved around somehow, and often over a network!

Not trying to "land" anything.
I’ve found that LLMs will frequently do extremely silly things that no person would do to make typescript code pass the typechecker.
I've noticed this too, but not necessarily type checkers, but more with linters. And can't really figure out if there's even a way to solve it.

If you set up restrictive linters and don't explicitly prohibit agents from adding inline allows, most LOC will be allow comments.

Based on this learning, I've decided to prohibit any inline allows. And then agents started doing very questionable things to satisfy clippy.

Recent example:

- Claude set up a test support module so that it could reuse things. Since this was not used in all tests, rust complained about dead_code. Instead of making it work, claude decided to remove test support module and just... blow up each test.

If you enable thinking summaries, you'll always see agent saying something like: "I need to be pragmatic", which is the right choice 50% of the time.

Yeah, I've found LLMs cannot write good Typescript code period. The good news is that they are excellent at some other languages.
I can't agree here. https://pelorus-nav.com/ (one of my side projects) is 95-98% written by Claude Opus 4.6, all in very nice typescript which I carefully review and correct, and use good prompting and context hygiene to ensure it doesn't take shortcuts. It's taken a month or so but so worth it. And my packing list app packzen.org is also pretty decent typescript all through.
> which I carefully review and correct

So you do agree? If you are having to review and correct then it's not really the LLM writing it anymore. I have little doubt that you can write good Typescript, but that's not what I said. I said LLMs cannot write good Typescript and it seems you agree given your purported actions towards it. Which is quite unlike some other languages where LLMs write good code all the time — no hand holding necessary.

I find correction is rarely necessary with Opus 4.6. Definitely not so much that "it's not really the LLM writing it anymore." More like it's the author and I'm the editor (in this limited case -- of course architecturally the ideas are all mine.) But I totally respect that my prompt style, the type of app I'm writing, and other factors could be influencing my success vs. others' lack of success.
> of course architecturally the ideas are all mine.

What else would you need to correct? I've never had trouble with LLMs generating basic syntax in any language. Architecture is exactly the aspect of language where LLMs seem to like to go to crazytown when in Typescript. It seems you've noticed too if the ideas in that area have had to come all from you.

I think it can write working TypeScript code, and it can write good TypeScript code if it is guided by a knowledgable programmer. It requires actually reviewing all the code and giving pointed feedback though (which at that point is only slightly more efficient than just writing it yourself).
> It requires actually reviewing all the code and giving pointed feedback though

Exactly. You can write good Typescript, no doubt, but LLMs cannot. This is not like some other languages where LLM generated code is actually consistently good without needing to become the author.

You need to very specific and also question the output if it does something insane
I've found it's less about specificity and more about removing the # of critical assumptions it needs to make. Being too specific can be a hindrance in it's own regard.

And that's also a decent barometer for what it's good at. The more amount of critical assumptions AI needs to make, the less likely it is to make good ones.

For instance, when building a heat map, I don't have to get specific at all because the amount of consequential assumptions it needs to make is slim. I don't care or can change the colors, or the label placement.

This decade’s version of “works on my box”
I caught it using Parameters<typeof otherfn>[2] the other day. It wanted to avoid importing a type, so it did this nonsense. (I might have the syntax slightly wrong here, I'm writing from memory.)

But it's not all bad news. TIL about Parameters<T>.

> On Friday...

This should be done on day one with a company-wide skill or project template that defines hard limits and processes for the Agent.

Strict linters, formatters and code quality checks are essential to de-slopify the code as much as possible.

That doesn't fix bad design though, that's still on humans.

Fwiw, the article mirrors my experience when I started out too, even exactly with the same first month of vibecoding, then the next project which I did exactly like he outlined too.

Personally, I think it's just the natural flow when you're starting out. If he keeps going, his opinion is going to change and as he gets to know it better, he'll likely go more and more towards vibecoding again.

It's hard to say why, but you get better at it. Even if it's really hard to really put into words why

Given how addictive vibecoding is, I think it's very hard to be objective about the results if you are involved in the process.
It's a little like asking a cokehead how the addiction is going for him while he is high. Obviously he's going to say it's great because the consequences haven't hit him. Some percentage of addicts will never realize it was a problem at all.

Its not random that AI happens to be built by the very same people that turned internet forums into the most addictive communication technology ever.

> he'll likely go more and more towards vibecoding again

I think "more and more" is doing some very heavy lifting here. On the surface it reads like "a lot" to many people, I think, which is why this is hard to read without cringing a bit. Read like that it comes off as "It's very addictive and eventually you get lulled into accepting nonsense again, except I haven't realized that's what's happening".

But the truth is that this comment really relies entirely on what "more and more" means here.

[flagged]
really?

have you ever learned a skill? Like carving, singing, playing guitar, playing a video game, anything?

It's easy to get better at it without understanding why you're better at it. As a matter of fact, very very few people master the discipline enough to be able to grasp the reason for why they're actually better

Most people just come up with random shit which may or may not be related. Which I just abstained from.

I've learned a number of skills, and for me none of them worked in the way you're describing. I didn't learn to cut good miter joints by randomly vibe-sawing wood until I unlocked miter joints in the skill tree. I carefully studied the errors I made, and adjusted in ways I thought might correct them, some of which helped some of which did not. Then eventually I understood the relationship between my actions and the underlying principles in enough detail to consistently hit 45 degrees.
Isn't that example pretty reductive, in that you have a directly-measurable output? I mean, the joint is either 45° (well, 90°) or it's not. Zoom out a bit, and the skill-set becomes much less definable: are my cabinets good - for some intersection of well-proportioned, elegantly-finished, and fit for purpose, with well-chosen wood and appropriate hardware.

Mind you, I don't think the process of improvement in those dimensions is fundamentally different, just much less direct and not easily (or perhaps even at all) articulable.

You can get better at something without understanding why, but you should be able to think about it and determine why fairly easily.

This is something everyone who cares about improving in a skill does regularly - examine their improvement, the reasons behind it, and how to add to them. That’s the basis of self-driven learning.

This is an absurd statement. There are many complex undertakings in sport where even the very best get better with practice and can't tell you why. In fact, the ones who think they can tell you why are the one's to be most skeptical of.

You are just making stuff up or regurgitating material from a pop science book.

They can't tell you (not everyone is eloquent), but they sure know why. Struggling to put something in word is not the same as not knowing.
Not really. I can obviously say something, like you learn which features the models are able to actually implement, and you learn how to phrase and approach trickier features to get the model too do what you want.

And that's not really explainable without exploring specific examples. And now we're in thousands of words of explanation territory, hence my decision to say it's hard to put it into words.

I think you’re handwaving away vague, ungrounded intuition and calling it learning.

For instance, if I say “I noticed I run better in my blue shoes than my red shoes” I did not learn anything. If I examine my shoes and notice that my blue shoes have a cushioned sole, while my red shoes are flat, I can combine that with thinking about how I run and learn that cushioned soles cause less fatigue to the muscles in my feet and ankles.

The reason the difference matters is because if I don’t do the learning step, when buy another pair of blue shoes but they’re flat soled, I’m back to square one.

Back to the real scenario, if you hold on to your ungrounded intuition re what tricks and phrasing work without understanding why, you may find those don’t work at all on a new model version or when forced to change to a different product due to price, insolvency, etc.