Hacker News new | ask | show | jobs
by malfist 10 days ago
One of my co-workers just asked me to review his pull request that was all AI generated. 600 files were touched, over 40k lines of code added.

I'm sure he thought that was a crowning achievement, proof that AI can enable 10X developers, after all, what engineer could write 40k lines of code in a week?

I declined to review it, stating that I couldn't possibly vet 40k lines of code, and wouldn't put my reputation on the line to stamp the work as good. The PR nagged me for 2 weeks from my todo list and then disappeared. I don't know if he found another dev to get an approval from, or if the PR was abandoned. But I know for sure that him and I are on two totally separate islands around the value of LLMs.

7 comments

Same here. A co-worker touched a few hundred files in a PR and asked us to review. They merged it directly to main when nobody approved it. (The repo was not set up to enforce PR approval.)

I don't personally use that feature, and I couldn't care less at this point. If our customers are frustrated by the bugs, at least my name is not on it.

The challenge is that you may not have customers, and thus a job, if that continues
His job will be to fix all the mess that other people created.
Crazy they merged into main holy moly
That's a process problem at your company - no developer should be proposing branches over 1k loc (or whatever your agreed tolerance threshold is) without a very good reason, vibe coded or not.
It isn't about small or big, it's about cohesion of the changes.

I prefer a big feature to be one big PR rather than a lot of small ones.

We had a dev do a big feature with a ton of small PRs, each one was individually impossible to review because each concern was out of scope for the small PR and "would be fixed in later PRs". Once it all came together as as whole, the big picture was a total horror show and I had to rewrite basically the whole thing.

In order to review those small PRs properly, each time I would have to read and understand all the current code so far from the beginning. Without that, each small PR individually looks OK because you won't remember the other PRs from weeks back that already duplicated what the current small PR does for example.

>> I prefer a big feature to be one big PR rather than a lot of small ones.

Yes, same, and I genuinely do not understand the insistence that PRs should not be above a certain size. I think most people are under the (misguided and wrong) impression that a PR review should take less than the time it took to write the code, and therefore allocate no more than 15-30 minutes per review. So when they come across a large PR they find themselves at a loss.

Fully agree. If you want to follow the thought process through a large PR, review each commit (assuming, of course, the author made reasonable commits) on its own.
> no developer should be proposing branches over 1k loc

I've seen that reaction many times. It seems to work well enough when someone is maintaining existing code. However, greenfield projects can often require literally orders of magnitude more code to deliver something that can be integration tested.

The first step is to break it up into a stack of commits. Each one must compile and pass its unit tests, of course. Keeping it under 1k loc of released executable code is usually easy, but often becomes difficult to impossible if you want well commented code with excellent unit test coverage.

Assuming you have kept all your commits under 1k loc, there is still the problem of whether you present them in one PR, or as a stack of PRs. The issue with a stack is why an API is designed a certain way often isn't evident until you see how it's used. Responses to PR comments are explanations that point to later PRs in the stack, which is irritating for both the reviewer and the author.

I haven't found a good solution. I'm not sure there is one.

I mean, you're not creating the api from thin air as you write code right? Usually you'd have some larger doc about the project with the design you can point to.
> no developer should be proposing branches over 1k loc

I completely agree with you. But I am afraid we are losing the battle.

I am seeing people repeatedly sending out gigantic PRs full of slop, code with mistakes that they would never have made if they were hand coding it. And they don't care. It's sometimes surprising if not horrifying to find that the colleagues you have worked with for years don't care about quality at all -- almost despising spending time reviewing their own code. Yet they have the audacity to send out code reviews.

A former coworker sent me an AI generated PR to review and I just said NAK after the first two issues I found and I said to not send me AI slop to review.

They went to HR who said I am more senior and I should act as a mentor (they had my same work title and were probably making 4x more due to being in USA) and I just no longer reviewed anything from them until I changed jobs.

People actually go to HR for such trivial things?
All the time. And HR are generally bored and enjoy some drama.
I declined to review it, stating that I couldn't possibly vet 40k lines of code

Gee, that sounds like a job for Claude if there ever was one.

You're absolutely right!
And how would you verify that the review was accurate?
Ask it to prove it.

My approach for AI-first code review, or really any kind of AI technical opinion, is that if the claim AI made is both important and not obviously true at a glance, it has to prove it to me, and keep trying until I'm convinced or can spot an obvious mistake in the proof.

With reviews, this is usually the case where AI is making a claim that something in the PR will fail because of some assumptions or behaviors in code outside of the PR - e.g. "this change will fail in scenario X, because foo is null in this case, because the SQL query doesn't populate it when bar == quux, and it gets propagated as null through the JSON deserialization (optional field)...", where all the SQL and JSON parsing was not part of the code under review, and "bar == quux" is some weird domain special case.

Stuff like this is both critical, and there's no way for me to judge it without an expensive context switch. So I learn to ask for a more detailed walk-through once, and if that doesn't make me "see" it, I just ask it to reproduce it with tests, and confirm it's a real problem. Reviewing the reproduction is usually enough for me to either "see it" or accept they're probably right and ask the author to recheck it.

(Why not jump straight to "reproduce it" for every finding? Because it still takes time to have AI do the repro. It's cheaper than a deep context switch, but not free.)

Same way that I would trust your review to be accurate. Because the reviewer has built a reputation for correctness.

Its not Claude doing the review. Its a human doing the review, but using Claude to do the reading. Its still on the human to ask the right questions to Claude.

For large changes that are not straightforward and include architectural decisions, I wouldn't trust Claude enough to not read most of the code myself. I'll have to read it to be able to understand it and ask about the decisions in detail anyway. And when I start to understand it, it's not uncommon to find out that the solution can be improved and simplified in many places, and after iterating, 25-30% of code disappears.

And trying to just hand-wave it to Claude, to somehow "improve it" or "simplify it", without detailed questions hasn't been very successful. It can work for some things, though.

So you'd produce the code using Claude, and then use Claude to verify it? Would you accept my review of my own code?
Depends. Do you take pills that let you forget that you wrote the code so you can review that same code with fresh eyes that haven't seen that code before? Though you could just use ChatGPT to review the code that Claude wrote if that's really the issue.
You can use Claude for that
Claude to the rescue
Claude Van Damme ?

I'd prefer Chuck to the rescue but I guess it's a cultural preference.

/s

At work we had copilot. It said "the diff is too big to review"
This is a branching point. One dev would find someone else and convince them to approve it. Another would redo the task (code is cheap now, right?) in a PR stack that can actually be reviewed, cleaned up etc.

I hope they were the latter.

My review would have been along the lines of:

'Please split this PR into smaller ones'. I would even sketch which groups/phases would make sense, perhaps with the help of AI.

You could surely check on the status of that PR.