Hacker News new | ask | show | jobs
by tjradcliffe 4190 days ago
Code doesn't capture intent in many critical cases, so figuring out what a piece of code is supposed to do is different from figuring out what it does. This is true in part because there are very different levels of abstraction involved.

To take a trivial example:

norm = sqrt(x[0]2 + x[1]2 + x[2]2) x[0] /= norm x[1] /= norm x[2] /= norm

This could be described as "take the square root of the sum of three values and then divide each value by the result" or "renormalize a vector". The latter is by far the more meaningful and useful description because it is presented at the level of abstraction that the user is likely interested in.

You could say "well why not create a function called 'renormalize_vector' so it would be self-documenting?" Fine, but now you have a function call per renormalization and that has a cost that may be unacceptable. For many simulations renormalizing vectors with a norm near unity is a big overhead, to the extent that I've written custom code to handle that special case and implement it as a macro that I could call "FAST_RENORM_NEAR_UNITY"... but what does "near unity" mean? And what trade-offs went into the design choices? What code isn't there because I tried it and it didn't work well?

People who advocate self-documenting code generally talk as if self-documenting techniques come at zero cost (adding a function call is an unacceptably high cost in some cases) and that the code that exists adequately captures all the thinking that went into it (it does not and cannot.)

So while I'm all for as much self-documentation as possible, any non-trivial code is going to require additional documentation to a) describe the purpose in high-level terms and b) capture the alternatives that were rejected and why.

Unfortunately, for open source projects especially, there is a law of documentation that says power*documentation=constant, so the most powerful code has the worst documentation, and there are projects with great documentation that simply don't do much.

2 comments

This a hundred times. Comments should always be about intent; never about what's actually happening (the mechanistic description). I don't need help understanding the code as read; I need to know WHY the code was written. So I can debug, follow code paths, skim.

A further advantage: such comments have longer halflives. A rewritten method may still have the same purpose long after all details are changed.

> but now you have a function call per renormalization and that has a cost that may be unacceptable.

I would go for the function, and pass along Knuth's advice about premature optimization. If you're writing at such a low level that function calls actually aren't acceptable, go with a comment "// renormalize vector." Your instinct should be the function though. I bet there is more than one vector normalization going on in this hypothetical codebase, and that line looks pretty typo-prone.