Hacker News new | ask | show | jobs
by JohnBooty 2491 days ago

    Documentation is essential. How things work is an important thing to 
    document. Ideally it should be in version control and be generated 
    from the code, because then it's less likely to go out of date. 
My solution to this is old and fairly unpopular, but I stand by it: anything in the codebase that's not obvious to a new maintainer should have a brief, explanatory code comment.

Generally, this falls into two categories.

1. Hacks/kludges to get around bugs in hardware, external services, or included libraries. These manifest in code as incomprehensible, ugly bits of code that are difficult to distinguish from code that is simply "sloppy" or uninformed. More importantly, they represent hard-won knowledge. It often takes many programmer-hours to discover that knowledge, and therefore many dollars. Why throw it away? (Tip: include the version of the dependency in the comment, ie)

    # work around bug in libfoo 2.3, see blahblahblah.com/issues/libfoo/48987 for info
    # should go away once we can upgrade to libfoo 3..
    if error_code == 42 reset_buffer()
...so that future programmers (including you) can more easily judge whether the kludge is still needed in the future.

2. Business logic. This too is difficult/impossible to discern from looking at code. Often, one's git commit history is sufficient. But there are any number of scenarios where version control history can become divorced from the code, or require a fair bit of git/hg/svn/whatever spelunking to access. And this of course becomes increasingly onerous as a module grows. If there are 200 lines of code in a given module, it is a significant time investment to go git spelunking for the origins of all 200 lines of code. Some concise internal documentation in the form of code comments can save an order of magnitude or two of effort.

    It still has problems (What do you do when the code and the 
    documentation disagree? Which is correct?), but they're not as 
    severe as the problems that arise when there is no documentation at all.
This is pretty easy to enforce at code review time, prior to merging.

In the first place, only a true maniac would intentionally update

    # no sales tax in Kerplakistan on Mondays
    return nil if country_code==56 and day_of_week==1
 
...without updating the associated comment. If they do neglect to update it, that's an easy catch at review time.
6 comments

Count me in as another old timer who agrees. I had a friend once throw the "code should be self-documenting" line at me once and it upsets me. That only really applies for code that is so simple it writes itself, and never has any gotchas hiding (and which useful project is like that?).

Leaning towards commenting "why" not "what" is another good general rule. "Self-documenting code" with sensible function and variables names and logical flow already cover the "what" fairly well.

While I still would add a comment about the why, your last bit of code probably should be written without magic constants.

    # Some countries have sales tax rules dependent on the day of the week
    return nil if country_code==KERPLAKISTAN and day_of_week==MONDAY

The exact comment here could probably be more specific (e.g. where do you find these rules), but it also most likely shouldn't repeat the code (and the code should make clear what it represents).
If you do the substitution as you suggest and then add a unit test, then you have something ;-) Something on the lines of "describe countries with sales tax dependent upon the days of the week => Kerplakistan doesn't have sales tax on Mondays" So now it's self documented and self testing.

But I agree with your statement that there should be a pointer to the business rules somewhere. Otherwise it's difficult to have a meeting with the business side and ask, "Has anything here changed?" I think that's the biggest thing people miss out -- It's not that hard to find the thing in the code if things change. It's super hard to make sure you are on top of all the business requirement changes.

But don't do what one memorably awful project I had to maintain did - to use that example they would have done:

country_code==FIFTY_FIVE and day_of_week==ONE

But what if the definition of 55 changes? You'll be glad to have your table of constants then.
The project also defined HTTP, COLON, SLASH, WWW and DOT so that you would have:

   string url = HTTP + COLON + SLASH + SLASH + WWW + DOT ...
I swear I'm not making this up....
Reminds me of http://pk.org/rutgers/notes/pikestyle.html

> "There is a famously bad comment style: ...

Don't laugh now, wait until you see it in real life."

Well at least a typo sould give a compile time error for some subset of typos.

But in the trade-off in code readability was probably the cause of many other mistakes, so probably ended up further behind.

Sounds like a PHP codebase I'm currently working in. I shit you not, $LI = '<li>' is in the functions file, along with $LI_END.
It was a very enterprisey Java codebase from the bad old days of J2EE - it had somewhere over 30 layers of abstractions between the code in a JSP and a web service call.

[NB 30 isn't an exaggeration - I think the vast team who wrote it were paid by the abstraction or something].

But how else will your compiler tell you that you mistyped 55? /s
That's a very good point. Avoiding magic numbers would have removed the need for an explanatory comment in my example.
Comments are often a code smell. In lots of examples, better variable naming, breaking something out into a function, or constants often reduces the need for a code comment.
I disagree. Code ages and people move on. 2 years down the line some new guys are maintaining the code base. Some new guy is testing the system and notices that sales tax values seem to be "strange" for Kerplakistan on certain days of the week so they create a ticket for it. Then that goes through the typical pipeline. Another member of the team gets assigned the issue and looks into it. They come across the line:

  return nil if country_code==KERPLAKISTAN and day_of_week==MONDAY
Hmm.. Well that's strange. I don't have a background in Kerplakistan monetary policy so I don't know why we aren't assessing sales tax on Monday. Perhaps Kerplakistan is a special case. Is that being handled somewhere downstream? Then 1-2 hours later, after shuffling through source and eventually just Googling Kerplakistan sales taxes, you discover what someone found out 2 years ago when they wrote that line. Now you resolve the ticket and move on with your day but you just wasted a couple man-hours on a non-issue that could have been resolved instantly from a code comment.

Comments are as much for the next guy as they are for you.

Without a more concrete example, it's difficult to suggest what the better fix would be.

Code smell doesn't mean you should never do it, just that often there's a better way.

Here's a more real-world example.

I worked on an enterprisey line of business app that assigned sales leads to salespeople.

The algorithm to do this was a multi-step process that was (1) rather complex (2) constantly being tweaked (3) very successful (4) contained a number of weighting factors that were utterly arbitrary even to veterans of this app.

It was full of many `if country_code==KERPLAKISTAN && day_of_week==MONDAY` -style weighting factors. Each represented some hard-won experience And when I say "hard-won" I mean "expensive" -- generating leads is expensive business.

We had a strong culture of informative commit messages, but this file had hundreds if not thousands of commits over the years.

It was the kind of code that resisted serious refactoring or a more streamlined design because it was a recipient of frequent change requests.

A few human-readable comments here and there went a loooong way toward taming the insanity and allowing that module to be worked on by developers besides the original author.

Knowing the why for many of these rules made it much easier to work with, and also allowed developers to be educated about the business itself.

I agree. The most obvious place to find an explanation of a piece of code, is right beside that code. Not hidden away in some git commit message or nested away in confluence.
>anything in the codebase that's not obvious to a new maintainer should have a brief, explanatory code comment

I'm not at all convinced that this is unpopular, but I think it's a whole lot harder than you're letting on. Unless you have a constant stream of new people coming in and you can convince them to give honest feedback, you don't actually know what's not obvious.

Why not:

    return nil if country_code==KERPLAKISTAN and day_of_week==MONDAY
Then you don't need comments and the sync problem goes away?
Except this doesn't retain the crucial information: why? It looks arbitrary. The thought that "some countries have sales tax rules dependent on the day of the week" may or may not be obvious from the context. At the very least, the comment pins a point in the space of all possible reasons for that piece of code - with it, you know it's related to sales tax and week days, and isn't e.g. a workaround for the bug with NaNs in tax rates that you saw on the issue tracker last week.
This is admittedly a trivial example, but ideally you want developers who understand why we're doing this.

Is this a quick thing somebody hacked in for a special, one-off, tax-free month in Kerplakistan as the country celebrates the birth of a princess?

Is this a permanent thing? Will there eventually be more weirdo tax rules for this country? Will there be others for other countries?

Knowing the "why" would help a developer understand the business, and reason about how best to work with this bit of code... should we just leave this ugly little special case in place? Should we have a more robust, extracted tax code module, etc.?

Commit messages help to accomplish this too, and can offer richer context than inline comments. Each has their place. Sifting through hundreds of commit messages in a frequently-updated module is not a great way to learn about the current state of the module, as the majority of those commit messages may well be utterly stale.

Ultimately the cost of having some concise inline comments is rather low, and the potential payoff is very large.

Remember that the longer term goal (besides the success of the business) of software is to have your developers gain institutional knowledge so that they can make more informed engineering decisions in the future.

Yup. This + some diagrams for models and infrastructure is plenty