Hacker News new | ask | show | jobs
by cryptonector 2915 days ago
That's all true, but (and note: I didn't downvote you) the problem I have with email for code review is that finding all the comments made over time on one patch is... a chore. Mind you, for many years I did code review over email, so I'm used to it, and since I'm a TUI sort of person, it works very very well for me except for the chore that is finding and collating all comments.

This chore is very real, and very time consuming. Reviewers need to make sure that new versions of a patch are responsive to past [accepted] commentary. Authors need to find all commentary when updating patches in order to be responsive to it.

Not that code review tools from GitHub, GitLab, and others, are all that good. I've yet to find a code review tool that really works for me and the people I work with.

Whether you use email as a transport and/or interface, or something more weby, or whatever, at the end of the day, for me, the real win lies in code review and commentary management and UIs. Because I'm a TUI person, my approach would be to build a code/patch+commentary format, something like (designing on the fly here):

    - foo.c
    |     /* This is code (or patch) */
    ? jane   ^^^^^^^^^^^^
    ? jon    The above is code, yes, but this is commentary!
    |     while (!review_done)
    |         ...;
or

    - 0001-This-is-a.patch
    | diff ...
    ...
Something more markdown-y would work too.

An XML version wouldn't be a bad idea, provided a) that the schema is trivial, b) that there are tools for converting between a simple text format (e.g., like the above), c) preferably things round-trip. Obviously, with XSLT/XPath, (b) should be feasible, especially if we heed (a). And (c) should be possible if the XML schema is kept simple enough (but see below).

What's nice about this is that as long as there is version information we can merge commentary from many emails into one document (per file/patch), which then could make automation of commentary collation possible.

This needs lots more thought, I grant that. For example: how to cut-down quoted text without losing the ability to trivially merge commentary? Also, it would be very tempting to add lots of metadata, which would drive one to XML. I'm not a huge fan of XML, mostly due to lack of TUI editors for it... but XML does have the advantage of XSLT/XPath.

It seems to me that this is a problem that can be solved reasonably well, and that this is a problem worth solving. If we can solve it in a way that works well with TUIs, it will also work well with email, so this should be of interest to you (I hope so as you seem to have the time and energy to work on this!).

Thoughts?

5 comments

Thanks for your detailed thoughts, I'm taking all of this down and using it to influence my work on lists.sr.ht (the mailing list and code review tool I'm building).

The chores of this approach - finding and collating feedback, sheparding a patch through the flow, and so on - are all things I'm acutely aware of. I don't think we need to throw the baby out with the bathwater like GitHub does, though, and I see these as places to improve our tools rather than replace them.

Take a look at this here:

https://git.sr.ht/~emersion/python-emailthreads/tree/test/da...

The specific output format you see here is just a testing tool, this isn't the final user-facing product. The important part is ingesting a bunch of emails and creating a thread where you can associate all of the responses over time with the particular parts of the patch. Combining this with a gerrit-like tool which tracks new patches over time and avoids duplicating review work is planned.

Thinking about it more... There is not much need for XML.

The inputs are plaintext replies (or HTML formatted as plaintext).

The original document is immutable.

There is no state (because it can be collated from scratch every time). EDIT: If updates are allowed in any non-email based way (e.g., via a BUI, REST APIs) then email must be generated and sent so that all the state is in the email archives and all participants can collate the same state.

But the result of the collation is more like a database and less like a document. In particular there's not much need for marking up commentary.

Multiple collation output formats will be needed: plain text, XML, XHTML, JSON. These formats/schemas are only needed for display purposes, not interchange, as the collation results can be computed independently by all who have access to the emails. This means there is nothing to standardize!

Thus collation state can be kept any way you want. XML, SQLite3, some NoSQL, any in-memory representation you like -- whatever suits you. Still, I'd use anything that has a nice, declarative query language. XML (XSLT/XPath), JSON (JSONPath, jq), or an RDBMS (SQL) -- all will do. For you, and given your emphasis on open source, SQLite3 or PostgreSQL will work best, I think. You can generate all of the output formats you need from such a database.

No, of course we shouldn't throw the baby out with the bathwater. But this is, for me and many others, the biggest issue in using an email-driven approach to code review and patch integration -- I say this as a very TUI-focused user, a tmux/mutt/vim/cscope/shell/... power user. That's not a fatal problem, but a critical challenge. Simply put, without a way to address this issue satisfactorily, an email-driven approach to patch integration will only be a niche thing. Now, mutt and such only work for a small enough number of people, so they are niche tools too. But email-driven patch integration will be even more niche, though it could be at least as popular as vim if this problem is solved.

Another thing is that with an approach to email code review that solves the collation problem you might... be able to seamlessly bridge email and web interfaces. For example, GitHub does actually have an email interface -- a very poor interface (markdown won't be interpreted, yo wth GH?) and only for issues, not code review. But your service could provide a superior email interface to code review... that also plays well with the web. I'm glad you're already thinking of this.

(GH doesn't care about email integration because with a REST API and BUI they can cover all their users' needs (except email-centrism!) without having to deal with the challenge of email issues (like this collation issue). GH has chosen to not bother with email, while you're taking up the challenge. As a lover of email, I say good for you (and for the rest of us)!)

Again it's tempting to think of XML. If you want to convert to a plain text format, XSLT can do it. And if you want to convert it to HTML, XSLT can do it. Or you can fetch the XML with XHR. Or convert it to JSON for fetching with XHR... again, XSLT can do it (well, I've never written XSLT code to produce JSON, and I'm guessing doing it right isn't easy, but maybe newer XSLT iterations have support for JSON? idk, I've not followed it). If you use JSON you could use jq, natch, but JSON is a poor format for what is really document-like data, so I'd stay away from JSON. Even if you choose not to use XML (I wouldn't blame you! XML is for masochists), thinking about this problem with XML in mind might help you design a non-XML format that covers all the things you need to cover.

I'm not just a TUI power user. I'm also a jq and XSLT power user, meaning that APIs (meaning, nowadays, "RESTful" APIs) are very convenient. Structured document formats add a lot of value, but in an email context you just can't access that value due to lack of a) a notion of "email API", b) MUA integration with editors for different MIME types, c) a total dearth of editors for those formats lying in between vim on the TUI side and Word on the GUI side. In this particular context you have the benefit that you can consumer plain text and extract from it commentary due to the traditional email angle-bracket quoting style, then merge that content into a structured document from which you can render plain text again if desired.

Anyways, really, best of luck. If you want to exchange further thoughts off HN, I'm sure you can find my email address.

My solution to the "find all the comments made on one patch" is variously (a) look at that email subthread only or (b) use a webinterface like patchwork (eg https://patchwork.ozlabs.org/patch/938069/) or patchew (eg http://patchew.org/QEMU/20180629162122.19376-1-peter.maydell...). patches (https://github.com/stefanha/patches) can handle the tedious bit of collecting up all the Reviewed-by: tags people sent for you.

The key thing about any tool that's trying to improve an email-based workflow is that it has to interoperate with people still using the traditional "just an email client" tools. Better web UIs for people who want to use web UIs are something I'd love to see, but a big-bang switchover to a completely different format won't work for large projects, where mandating changes of tooling is a really hard sell. All the three tools I list above take the "interoperate with the existing patches-in-email transport" approach.

Yes, the difficulty lies in having to deal with just-whatever-the-users-and-their-MUAs-do. It's a bit of a mess. But a bit of convention can be taught to users -- think of Markdown, which has been quite a success because it's just a bit of convention that users were often accidentally using anyways. A bit of convention can go a long way.

The real challenge is that the GUI MUAs often make it hard for users to use even a bit of convention in plain text... I'm thinking of Outlook in particular, but it's not the only one. Even with an MUA that does angle-bracket quoting the right way and encourages users to trim quotes and bottom-post, the fact is that editing plain text is difficult and boring for most non-vim/emacs users out there, and that is most users. This problem can't be solved by the author of TFA, but if the author of TFA solves the problems that are in their bailiwick, then that could add pressure on MUA developers.

Hi,

I'm the author of emailthreads, mentionned in the article. I believe the library is already able to do something pretty close to what you want. See for example this output: https://github.com/emersion/python-emailthreads/blob/master/... (you can find source messages in the parent directory)

The goal is to eventually build a web UI around this library.

Oh, I skipped that paragraph. Oy, sorry. I think the text format for interleaving source and commentary needs its own discussion somewhere, and probably a standalone repository separate from implementation (and some day preferably too, an RFC, or at least a MIME type registration).

I've looked at the examples you linked, and... I like that the review format for review authoring is just plain trivial. Not sure how it works with quote trimming... I'm not sure I like the output format, but I'll go think about it.

>I think the text format for interleaving source and commentary needs its own discussion somewhere, and probably a standalone repository separate from implementation (and some day preferably too, an RFC, or at least a MIME type registration).

Yeah, this library just parses email threads and the goal isn't to define a new format. This particular output format is just for debugging and unit testing, it's not meant to e actually used as is. The library user just gets a tree of text fragments with metadata, it's up to him how the output will look like.

The output format is just a means of testing the library, not the final user-facing form this will take.
What's the main difference between the library you're building and mail parser?:

https://pypi.org/project/mail-parser/

Since there isn't any documentation, I'm just trying to figure out what's it's actually doing (without reading the source).

The library emersion is working on does not parse an email, it parses a thread of emails. It can relate several emails together into a tree of responses to the original email and break down the responses to e.g. particular lines of code in a patch.
Have you tried Phabricator? It tracks completion status of comments on a review. Letting you easily find the un-finished ones. Comments can also apply to single or multi-line sections of the file, and they stay present even as the patch itself is updated in response to comments.
Phabricator is decent in many ways, but it's terrible at handling patch series. Most people don't seem to be aware of the feature at all, and it's not integrated at all into tools like arcanist.
> finding all the comments made over time on one patch is... a chore

True enough. Not repeat that analysis in 17 years. Archives of LKML go back to the list's creation over two decades ago. How long do you expect your github page to last before the format changes or the project gets dropped or pickled?

I feel like you missed the whole point of this sub-thread. No one in it is arguing for not using email for reviews. Rather, we're talking about how to make it easier to use email for reviews.