Hacker News new | ask | show | jobs
by csallen 12 days ago
For some reason, tons of people seem to be in camps at both extremes. It's either "AI sucks don't trust it!" or "AI is so much better than humans!"

But the most reasonable take, which I'm happy to see reflected in so many comments in this thread, is… use both.

Do an AI pass, and have humans verify, and vice versa. Let the humans drive the AI. Then the unique shortcomings of each party can be covered by the other's strengths.

3 comments

AI review is never going to beat a fully resourced human review.

It might beat an underresourced human review, on time, efficiency, cost metrics. But on the metric of accuracy, throwing unlimited humans at a problem will still beat throwing unlimited AI at it

That's an irrelevant comparison because cost is always a constraint, so there are not going to be unlimited AI or humans. The question is how to optimally combine them for a given cost.
> Do an AI pass, and have humans verify, and vice versa. Let the humans drive the AI.

You can do that, sure. But doing so negates any improvements in speed the LLM brought. And at that point, you may as well just do it yourself to begin with.

When Google showed up on the scene I found I no longer needed to memorize basic syntax and other such things. If I couldn't remember on the fly, i'd just do a quick google search and move on. This freed space in my mind to instead focus on bigger & better things.

I use GenAI tools when coding a lot, but I do not vibe code. I go through everything it generated, and we iterate. And yes, it doesn't save me a lot of time. But what it does do is free up mental capacity in a similar manner. But instead of syntax, it's more complicated patterns. Maybe I don't remember how to stitch something together, but i know it can be done. Instead of spending the time to look it up and then code it, I just tell it to do it for me.

> Maybe I don't remember how to stitch something together, but i know it can be done.

That's how I use the current AI, too. I never ask them to do something without specifying how it should be done. I ask questions first, use /plan to let the model ask me questions, then I let it execute the plan while reviewing the results. More and more often, I get something close enough to what I would have written. In the opposite case, I at least know exactly how to rewrite the result, if needed.

I observe the same effect as you: while it does sometimes speed up the implementation a bit, it's not very noticeable; however, it frees me from having to recall all the obscure little details up front. Instead, I can describe them, have the model implement them, and then recognize them (and refresh my memory) when reviewing. The effect is that it's easier to start a task because I don't need to prepare as much to execute it. It's especially notable on things that I haven't touched for some time. I know, more or less, how my Elixir projects are set up, but after ~2 years of not working on them, getting back into them had been a hassle - with AI, it's no longer that. I think the biggest difference comes from the AI lowering the cost of context switching for me - I used to have huge problems with that, and AI certainly helped a lot.

Yeah, humans reviewing the AI review can only detect the false positives, where the LLM claims something is non-compliant and flags it for review/correction by a human or another agent. Human review can’t find the false negatives (true deficiencies not flagged) unless you do a full audit yourself to find whatever deficiencies the AI missed.
>But doing so negates any improvements in speed the LLM brought.

We could do with less speed.

I feel like you're missing the point that it's more thorough to use both. Speed isn't the only factor that matters.
This makes sense, but a logical next step is to have one AI write code, and then have another AI, instead of humans, verify it.

Or are current AIs too similar for that to be fruitful?

This is commonly known as "LLM-as-a-judge" and anecdotally multiple people I know who write code using OpenRouter or using multiple models say it's surprisingly effective. It's strange that there don't appear to be any major papers on it since ~early 2025, which at this point is basically ancient history.