My approach for AI-first code review, or really any kind of AI technical opinion, is that if the claim AI made is both important and not obviously true at a glance, it has to prove it to me, and keep trying until I'm convinced or can spot an obvious mistake in the proof.
With reviews, this is usually the case where AI is making a claim that something in the PR will fail because of some assumptions or behaviors in code outside of the PR - e.g. "this change will fail in scenario X, because foo is null in this case, because the SQL query doesn't populate it when bar == quux, and it gets propagated as null through the JSON deserialization (optional field)...", where all the SQL and JSON parsing was not part of the code under review, and "bar == quux" is some weird domain special case.
Stuff like this is both critical, and there's no way for me to judge it without an expensive context switch. So I learn to ask for a more detailed walk-through once, and if that doesn't make me "see" it, I just ask it to reproduce it with tests, and confirm it's a real problem. Reviewing the reproduction is usually enough for me to either "see it" or accept they're probably right and ask the author to recheck it.
(Why not jump straight to "reproduce it" for every finding? Because it still takes time to have AI do the repro. It's cheaper than a deep context switch, but not free.)
Same way that I would trust your review to be accurate. Because the reviewer has built a reputation for correctness.
Its not Claude doing the review. Its a human doing the review, but using Claude to do the reading. Its still on the human to ask the right questions to Claude.
For large changes that are not straightforward and include architectural decisions, I wouldn't trust Claude enough to not read most of the code myself. I'll have to read it to be able to understand it and ask about the decisions in detail anyway. And when I start to understand it, it's not uncommon to find out that the solution can be improved and simplified in many places, and after iterating, 25-30% of code disappears.
And trying to just hand-wave it to Claude, to somehow "improve it" or "simplify it", without detailed questions hasn't been very successful. It can work for some things, though.
Depends. Do you take pills that let you forget that you wrote the code so you can review that same code with fresh eyes that haven't seen that code before? Though you could just use ChatGPT to review the code that Claude wrote if that's really the issue.
My approach for AI-first code review, or really any kind of AI technical opinion, is that if the claim AI made is both important and not obviously true at a glance, it has to prove it to me, and keep trying until I'm convinced or can spot an obvious mistake in the proof.
With reviews, this is usually the case where AI is making a claim that something in the PR will fail because of some assumptions or behaviors in code outside of the PR - e.g. "this change will fail in scenario X, because foo is null in this case, because the SQL query doesn't populate it when bar == quux, and it gets propagated as null through the JSON deserialization (optional field)...", where all the SQL and JSON parsing was not part of the code under review, and "bar == quux" is some weird domain special case.
Stuff like this is both critical, and there's no way for me to judge it without an expensive context switch. So I learn to ask for a more detailed walk-through once, and if that doesn't make me "see" it, I just ask it to reproduce it with tests, and confirm it's a real problem. Reviewing the reproduction is usually enough for me to either "see it" or accept they're probably right and ask the author to recheck it.
(Why not jump straight to "reproduce it" for every finding? Because it still takes time to have AI do the repro. It's cheaper than a deep context switch, but not free.)