It's difficult to spot issues from a huge diff.
But when a agent finish a task, it remembers what it just went through, where it got stuck, when it got corrected by human, what code make it want to say the F word to the author, that would be a good chance to make the codebase cleaner.
I got to see greptile and it had a pretty decent code review, somewhat like a static analysis tool without a lot of time wasting nonsense/false positives.
When I've used static analysis tools, the first run is usually helpful as you cherry pick the things that need to be fixed, but then subsequent runs are just the false positives or "only slightly a nit" kind of annoyances.
But human developers are the ones that say stuff like "Do we really have to use a database at all?" etc...