| I’m working on something in a similar direction and would appreciate feedback from people who’ve built or operated this kind of thing at scale. The idea is to hook into Bitbucket PR webhooks so that whenever a PR is raised on any repo, Jenkins spins up an isolated job that acts as an automated code reviewer. That job would pull the base branch and the feature branch, compute the diff, and use that as input for an AI-based review step. The prompt would ask the reviewer to behave like a senior engineer or architect, follow common industry review standards, and return structured feedback - explicitly separating must-have issues from nice-to-have improvements. The output would be generated as markdown and posted back to the PR, either as a comment or some attached artifact, so it’s visible alongside human review. The intent isn’t to replace human reviewers, but to catch obvious issues early and reduce review load. What I’m unsure about is whether diff-only context is actually sufficient for meaningful reviews, or if this becomes misleading without deeper repo and architectural awareness. I’m also concerned about failure modes - for example, noisy or overconfident comments, review fatigue, or teams starting to trust automated feedback more than they should. If you’ve tried something like this with Bitbucket/Jenkins, or think this is fundamentally a bad idea, I’d really like to hear why. I’m especially interested in practical lessons. |
The results of a diff-only review won't be very good. The good AI reviewers have ways to index your codebase and use tool searches to add more relevant context to the review prompt. Like some of them have definitely flagged legit bugs in review that were not apparent from the diff alone. And that makes a lot of sense because the best human reviewers tend to have a lot of knowledge about the codebase, like "you should use X helper function in Y file that already solves this".