Hacker News new | ask | show | jobs
On Pull Requests (cramer.io)
58 points by cribwi 4421 days ago
6 comments

As far as code reviews go, this is pretty spot on. I was part of a a startup that was using GitHub pull requests for code reviews. As the team grew, it became more and more intractable, although not simply because of notifications. Side-by-side diffs and checkpointed diffs (so that you can see what changed since the last round of review and whether/how your comments were addressed) are handled very poorly by GitHub. We ultimately switched to Phabricator, and while there was a little friction as folks got acquainted with the new tool, it made code reviews a much more pleasant process.

Recently, I had to go through a full code review back on GH pull requests, and it felt like pulling teeth in comparison. They're fine for interacting with contributors to an open source project, but compared to working with a tool like Phabricator that's built for a code reviewer's workflow (and for teams of engineers working together on a project), they just don't hold a candle, in my opinion.

Did you also try (or consider) Gerrit Code Review? If so, how does it compare to Phabricator?
I am not a part of the Docker team, but I think Docker handles this huge mess of notifications by gunsub:

https://github.com/jpetazzo/gunsub

tl;dr: automatically unsubscribe from Github issues that you are not directly involved in.

So, you can get a notification for every issue or pull request that opens in the project or organization that you are a member of, as usual, but you don't get bombarded by every notification, e-mail and status update for the duration of the life of that issue, and associated follow-up e-mails. If you want the notifications, just get involved in the conversation.

I am sure Phabricator is very good, and I'll check it out, but I don't have to sell my boss GitHub, he's already sold on it. We just need to get on feature branches and pull requests (at least if we're going to grow.)

As someone who recently implemented Phabricator at our company, it's really an amazing piece of technology. It's easy to install and configure, it does pretty much everything you might expect it to do and then some, and there's a lot of nice-to-haves (like closing tasks via commits, associating commits to tasks, connecting tasks together, etc).

Also, the ability to set up Herald rules to do any notifications you want, like 'add me to CC for any code review that touches this file path in any of these repositories', 'set up an audit whenever someone commits changes to the master branch of this repository', etc. It has a ton of flexibility for integrating with other services, like allowing people to authenticate through their Google account, GitHub, Twitter, JIRA, Disqus, Twitch.tv, Mozilla Persona, and obviously LDAP as well, or normal Phabricator-specific username/password authentication.

It's really insane how much integration there is, how rapidly the development is progressing (we found a bug on a Friday and it was fixed by Monday afternoon), and how responsive the dev team is (in #phabricator on Freenode).

All that for free. Pretty great.

> Side-by-side diffs and checkpointed diffs (so that you can see what changed since the last round of review and whether/how your comments were addressed) are handled very poorly by GitHub.

Did you consider whether that might've come from having pull requests that are simply too big? I've observed that especially neophyte developers tend to make commits and pull requests that are too big and conflate too many issues into one, that are be very handlable once split into smaller logical units.

We did consider that, and put a lot of effort into keeping pull requests small, but the reality is that GH pull requests just don't get the workflow right for a code review that may involve multiple team members, require several passes back and forth, and can handle larger features.

In my experience, pull requests just aren't the right tool for the job at the moment. That's not to say they don't have their place, but there are definitely superior tools out there.

TL;DR Email filtering + git (the shell tool) + text editor solves all of these problems without using fancy tools, imho.

To me this looks like a typical attempt to solve a work process problem via technology. Using any code review tool doesn't help if you don't get code review done right. If you filter notifications based on rules in the tool or let the developers do it via email filtering is basically a question of philosophy. If you look at a Diff that way or another also doesn't change much about the review process in itself. Don't solve process questions with more software features!

IMHO getting loads of emails is great. I can filter by myself what is interesting to me right now. And worse than the one or two uninteresting emails every quarter is when I don't get the information I need. A non-engineer middle manager is allowed to complain about email overload in my eyes. A developer should be able to handle pure text.

And while we are at it, I also believe that less is more. No matter which repository hosts my code, I do code reviews on my computer with my own developer tools. I'm not even sure how the current version of GitHub or Bitbucket diffs looks like. These websites are basically the shell scripts and disk space that are used to backup my code and allow other people to access it. If a developer wants to do a code review he can also decide on his tools. The rest is just email and "git merge" (--no-ff). But that's just my opinion.

I'm not sure if you're referring to the same situation or not. Can you really handle email from > 100 engineers on the same project? On top of all the non-code review emails you might have to deal with?

Also, you mention doing code review with your own tools on your own machine but I don't think that's the use case being discussed by the OP. It sounds like you're discussing reviewing the code to decide if you should pull it (since it's happening solely on your machine). The OP though is talking about a discussion of the code that happens between 2 or MORE people. Tools help that.

Being able to mark specific lines with comments or questions and have a back and forth between the submitters and reviewers doesn't seem like something easy to do if you're doing all the code review on your local machine.

Take a look at some chromium code reviews. Here's one with about 7 people participating

https://codereview.chromium.org/231133002/

And you can see inline comments, for example here's a few

https://codereview.chromium.org/231133002/diff/120001/cc/ani...

Is there a good way for that to happen offline on your local machine to involve 7 people?

I'm not an expert but I think that's how most Linux or related code is developed. Git + Bash + Vim/Emacs + Mailinglist/direct mail. They solve it by review hierarchy. One person has the responsibility to pull code or not and he in turn asks for a pull request from the maintainer one step higher when he has gathered enough new material. This doesn't just ensure fast development but also ensures that each commit is reviewed a few times before it gets into the main branch.

To answer the question about the >100 emails: Yes. From nearly everywhere I can I pull each and every email and then filter via filter rules and google's priority inbox (which you can mimic with open source tools as far as I've read). I seriously have only one or two emails more to handle than I want to in a span of three months. Sometimes I want an email with a specific topic, sometimes nearly everything a specific person writes or says, and it all gets piped and filtered quite well.

That's the awesomeness of pure text. There are not many limits to what you can achieve by having simple and well understood text formats. On the other hand the messaging in these programs is limited to what the program was meant to do. What a Phabricator filter rule is not able to do you are never able to do with Phabricator notifications. But emails where not meant to be sent and received in today's huge amounts, yet it's not a problem that can't be solved. Even if it is an open source tool that you install on your own server it's quite hard to make it do something different.

I may be amazingly biased given that I work for GitHub, but a lot of these issues are solved by using the Notifications section of GitHub (https://github.com/notifications).

Of the hundreds of repositories we have going internally, the largest is our main GitHub codebase. We have between 80 and 100 developers committing to that one repository every week. This includes something like 160 commits per day across something like 30 pull requests, every day. A lot of us have switched to using the Notifications tab over email (though some use email filtering and others use both). That tab groups all your notifications by project and you can both ack and mute threads easily.

I tend to go to Notifications once a day or so and manage everything - mute the stuff I don't need to hear from again, unwatch repos where I don't need every notification, see where I'm mentioned, etc. The user and team mentions that this author refers to as a "hack" are in fact a very powerful notification feature. If we need someone from a specific team to review something, we mention them or their team. I find it much more powerful and flexible than the more rigid assignment system of other tools, though you can also assign people to the issues associated with a PR in GitHub as well.

The Issues guide on our Guides site is a great overview of these systems and how to use them well for even large teams.

https://guides.github.com/features/issues/

https://guides.github.com/introduction/flow/

I've never been exposed to Phabricator before today, but its site[1] reminded me distinctly of http://xkcd.com/1293 . Pretty sure Phabricator is maintained by Beret Guy.

[1] http://phabricator.org/

What are the various 2014 git code review tools out there one can look into besides Phabricator?
There is Opera Critic [1], which was originally developed for working on the Opera web browser and subsequently open-sourced. I have used it (with a few patches) in conjunction with GitHub to fix many of the issues that are mentioned in the article. In particular it has the following features:

* Works with git. There isn't any VCS abstraction so if you want to use it with mercurial or whatever you're out of luck.

* The review tool (typically) owns the repository that code is pushed to (obviously this is a bit different when integrated with GitHub). Starting a review is typically something like "git push critic branch:r/branch".

* Because reviews are just branches you get all the features that the VCS implicity provides but patch based systems have to replicate (e.g. interdiffs) for free.

* Because reviews are just branches, multiple people can push to the same review so if you want e.g. a database expert to make the database changes for a particular change that is otherwise mostly written by a UI expert, that's possible.

* Assignment to reviewers is based on filters i.e. the tool picks the right reviewer for each change. There may be more than one reviewer for different parts of a single review request (e.g. where you have different owners for the database and ui code and a patch touches both those parts will be reviewed by different people). Reviews can be assigned to specific individuals when required.

* Review progress is tracked per commit and per file, so you can do long reviews in pieces rather than all at once.

* When issues are raised on particular lines of code it optimistically assumes that any subsequent push that touches those lines addresses the issue. If it wasn't addressed the reviewer reopens the issue. This works surprisingly well in practice.

* Rebases of the review branch are handled through "equivalent merge commits", which means that you retain all comments and progress from before the rebase.

* the UI is generally very "engineer designed" and won't win any beauty contests. But it has essential features that some other tools inexplicably lack, such as side-by-side diffs and syntax highlighting.

[1] https://critic-review.org

I found the website very lacking. It's cool that I can see the tool in action. But some context and explanations would be much appreciated.
Agreed. Unfortunately no one involved with the project has the right combination of skills and motivation to do a nice introductory website. It's a pity because technically the tool is great, but the PR/design side is almost entirely lacking.
For the product that I'm working on, we are trying to take code reviews to another level by making it very easy for you to implement your own code review logic. We have what we call Smart Attributes, which are like metadata for your commits, diffs, etc., but unlike traditional metadata, they can be programmed in JavaScript to react to what they are attached to. In the following screenshot, you'll find an example of how you can use Smart Attributes to make code reviews a little more intelligent:

http://screenshots.gitsense.com/intelligent-code-reviews.htm...

There is a Smart Attribute that has been programmed to track who has approved/rejected diffs at the file and directory level. You can find the source for this attribute here:

https://github.com/gitsense/smart-attributes-for-code-review...

There is an attribute that has been programmed to run Google's style guide lint tool against C++ files in the code review. You can find the source for this attribute here:

https://github.com/gitsense/smart-attributes-for-code-review...

And there is an attribute has been programmed to generate an activity feed. You can find the source for this attribute here:

https://github.com/gitsense/smart-attributes-for-code-review...

With Smart Attributes, you can implement whatever code review logic you want. And if you've created a pretty useful one, you can share it with others or sell it, if you think somebody will pay for it.

Gerrit is really good for code reviews
Just to give a second opinion... I do not like working with Gerrit (I work with Android, for a HW vendor). In particular, reviewing one patch at a time becomes painful very quickly. Especially for complex features that end up having many changesets, because it's difficult to review what was changed in the newest changeset.

I would prefer having a branch to review rather than a single patch. When there are fixes, add a new commit to the branch and squash/rebase before merging to master. This way it would be easier to see what is happening during the development of a more complex feature with many rounds of review and fixing.

(note: we might not be using the latest version of Gerrit and our version control practices could be better)

On the third hand: while pushing a line of commits to Gerrit does create separate reviews for each, there's nothing stopping you from pulling the entire branch down and looking at it as a whole, as well as being able to make sure each individual commit makes some level of sense. I especially like how Gerrit+Jenkins lets you build every revision. I don't like the way that GitHub and similar tools (thinking specifically of Stash) make it more difficult to look at the individual commits.

I'm also a fan of Gerrit having to option to enforce fast-forwards, something GitHub doesn't allow: it will always make a merge commit, even if there's only one commit and its parent is HEAD of the branch being merged to.

There's ReviewBoard[1]. It has a pretty decent UI as code review tools go, and it's not tightly coupled to any particular VCS. That has the downside that you don't necessarily have some of the collaboration niceties that the likes of Critic has. It's easy enough to install into a virtual environment with virtualenv, so it's easy enough to try out without needing to run anything as root. It supports automatic assignment of reviewers based on certain criteria, such as the type of files being reviewed.

[1]: http://www.reviewboard.org/

I make a product called Kiln that has pretty great code reviews for git: http://www.fogcreek.com/kiln/features/code-reviews/
Atlassian Stash provides self-hosted git repository management that provides code review, and is getting better with every release. It is based on the concept of pull requests also but provides workflow better suited to teams, including the ability to require successful builds and reviewer approvals. In addition it has awesome diff views and a plugin architecture for customising behaviour and integrating with other development tooling.

https://www.atlassian.com/software/stash

Disclosure: I work for Atlassian

I'd love to know how GitLab compares.
One of my biggest problems with pull requests is the tendency to have to merge a bunch of "Fix typo", "Refactor foo" style commits.

I prefer the workflow for sending patches on mailing lists: Email patch, get feedback, make new commits to address feedback, rebase (important step), repeat.

Why is that inherently better than a PR? I've had people give me feedback on my PR, I make the changes, then update the PR with the correct SHA.

I'm honestly curious. PR are a tool, you can misuse any tool.

You can force push to your remote branch, but I personally like to keep my branches local to my machine and simply send along .patch files that can be applied with git am.
> I personally like to keep my branches local to my machine

Are you saying you prefer to deny others the ability to directly interact with the same git structure you have in their git tools (by simply fetching from a remote), instead forcing them to apply a patch from an email and hope everything goes right? If so, why would you do that? This seems like it's a little idiosyncrasy you have that has little to do with the actual topic at hand.

No, I'm not saying that, but I contribute to some projects that do not use GitHub. I send patches to a mailing list, the maintainers review them, and then apply them when they're acceptable. I'm not saying that this is the only way to do it, but I'm pointing out that this workflow has a nice side-effect: clean patches. With GitHub's pull requests, you would typically respond to feedback by pushing some new commits ("fix typo", "add test", etc.). When I am sending patch files to a mailing list, I respond to feedback by making new commits locally, squashing them into the original patch set, and generating the patch set again. This keeps the git history clean and focused. The patches stand alone. Tools like Gerrit can also do this. I'm not endorsing one true workflow, but I am saying pull requests are inadequate.
> With GitHub's pull requests, you would typically respond to feedback by pushing some new commits ("fix typo", "add test", etc.).

Sounds like your problem isn't with pull requests, but with neophytes you have experienced in github projects you looked at, who do not rebase and force push.

In the majority of projects (excepting those run by neophytes) i participate in, whether on github or off, there has always been a clear line of pull requests required to be clean, meaning not only should the commits not contain cruft-cleanup, they also have to be on HEAD, contain Changes-file adjustments, contain tests, etc. When patch makers get these instructions and force-push that kind of thing, it's a breeze for maintainers to do a fetch, then compare their local marker on the requesters branch with the new state of the remote branch.

Having a requester go "no, i only send emails" in that situation would not be constructive.

You don't need to force push at all. You could just apply the changes to your PR in a different branch(completely local, rewrite history as much as you want, etc, etc) and then submit a second PR with that.
A whole new PR? That just shows how broken PRs are if that's the solution to this problem.
It's _A_ solution, but not a good one. I think most people consider it OK to force-push a PR branch and update the existing pull request.
To add another perspective, I very much dislike getting emailed patches. Creates more work for me. We use Phab add getting an email with a link, clicking the link and immediately seeing the diff is super easy. Reduces friction to increase use.