Hacker News new | ask | show | jobs
by bamfly 1061 days ago
This is why I comment a "why" for any line of code that's not incredibly obvious. And 100% of the time when it's due to interaction with something outside the codebase, whether that's an OS, filesystem, database, HTTP endpoint, hardware, whatever, if it's not some straightforward call to some API or library.

Sleep due to rate limiting from another service? COMMENT. Who's requiring it, the limits if I know exactly what they are at the time (and noting that I do not and this is just an educated guess that seems to work, if not), what the system behavior might look like if it's exceeded (if I know). Using a database for something trivial the filesystem could plausibly do, but in-fact cannot for very good reasons (say, your only ergonomic way to access the FS the way that you need to, in that environment, results in resource exhaustion via runaway syscalls under load)? Comment. Workaround for a bug in some widely-used library that Ubuntu inexplicably refuses to fix in their LTS release? Comment. That kind of thing.

I have written so very many "yes, I know this sucks, but here's why..." comments.

I also do it when I write code that I know won't do well at larger scale, but can't be bothered to make it more scalable just then, and it doesn't need to be under current expectations (which, 99% of the time, ends up being fine indefinitely). But that may be more about protecting my ego. :-) "Yes, I know this is reading the whole file into memory, but since this is just a batch-job program with an infrequent and predictable invocation and this file is expected to be smallish... whatever. If you're running out of memory, maybe start debugging here. If you're turning this into something invoked on-demand, maybe rewrite this." At least they know I knew it'd break, LOL.

7 comments

You're doing the lords work. I often get pushback on doing this with some variation of "comments bad, code should be self-documenting". This is unwise, because there are "what code does" and "why code does" comments, but this turns out to be to nuanced to battle the meme.
“Why” isn’t a property of the code itself, though; it’s rather a property of the process of creating the code.

Because of this, my personal belief is that the justification for any line of code belongs in the SCM commit message that introduced the code. `git blame` should ultimately take you to a clear, stand-alone explanation of “why” — one that, as a bonus, can be much longer than would be tolerated inline in the code itself.

Of course, I’m probably pretty unusual, in that I frequently write paragraphs of commit message for single-line changes if they’re sufficiently non-obvious; and I also believe that the best way to get to know a codebase isn’t to look at it in its current gestalt form, but rather to read its commit history forward from the beginning. (And that if other programmers also did that, maybe we’d all write better commit messages, for our collective future selves.)

I think there's room for the comments having a brief explanation and a longer one in the commit message. Sometimes people need the callout to go looking at the history because otherwise they might not realize that there's anything significant there at all.
The "brief explanation" is what the commit message's subject line is for :) Just turn on "show `git blame` beside code" in your IDE, and you'll get all the brief explanations you could ever want!

...but no, to be serious for a moment: this isn't really a workable idea as-is, but it could be. It needs discoverability — mostly in not just throwing noise at you, so that you'll actually pay attention to the signal. If there was some way for your text editor or IDE to not show you all the `git blame` subject lines, but just the ones that were somehow "marked as non-obvious" at commit time, then we could really have something here.

Personally, I think commit messages aren't structured enough. Github had the great idea of enabling PR code-reviewers to select line ranges and annotate them to point out problems. But there's no equivalent mechanism (in Github, or in git, or in any IDE I know of) for annotating the code "at commit time" to explain what you did out-of-band of the code itself, in a way that ends up in the commit message.

In my imagination, text-editors and IDEs would work together with SCMs to establish a standard for "embeddable commit-message code annotations." Rather than the commit being one long block of text, `git commit -p` (or equivalent IDE porcelain) would take you through your staged hunks like `git reset -p` does; but for each hunk, it would ask you to populate a few form fields. You'd give the hunk a (log-scaled) rating of non-obviousness, an arbitrary set of /[A-Z]+/ tags (think "TODO" or "XXX"), an eye-catching one-liner start to the explanation, and then as much additional free-text explanation as you like. All the per-hunk annotations would then get baked into a markdown-like microformat that embeds into the commit message, that text-editors/IDEs could recognize and pull back out of the commit message for display.

And then, your text-editor or IDE would:

1. embed each hunk's annotation-block above the code it references (as long as the code still exists to be referenced — think of it as vertical, hunk-wise "show `git blame` beside code");

2. calculate a visibility score for each annotation-block based on a project-config-file-based, but user-overridable arbitrary formula involving the non-obviousness value, the tags, the file path, and the lexical identifier path from the syntax highlighter (the same lexical-identifier-path modern `git diff` gives as a context line for each diff-hunk);

3a. if the visibility score is > 1, then show the full annotation-block for the hunk by default;

3b. else, if the visibility score is > 0, then show the annotation-block folded to just its first line;

3c. else, hide the annotation-block (but you can still reveal the matching annotation with some hotkey when the annotated lines are focused.)

Of course, because this is just sourced from (let's-pretend-it's-immutable) git history, these annotation-block lines would be "virtual" — i.e. they'd be read-only, and wouldn't have line-numbers in the editor. If the text-editor wants to be fancy, they could even be rendered in a little sticky-note callout box, and could default to rendering in a proportional-width font with soft wrapping. Think of some hybrid of regular doc-comments, and the editing annotations in Office/Google Docs.

---

...though, that's still not going as far as I'd really like. My real wish (that I don't expect to ever really happen) is for us to all be writing code as a series of literate codebase migrations — where your editor shows you the migration you're editing on the left, and the gestalt codebase that's generated as a result of running all migrations up to that one on the right, with the changes from the migration highlighted. You never directly edit the gestalt codebase; you only edit the migrations. And the migrations are what get committed to source-control — meaning that any code comments are there to be the literate documentation for the changes themselves; while the commit messages exist only to document the meta-level "editing process" that justifies the inclusion of the change.

Why? Because the goal is to structure the codebase for reading. Such a codebase would have one definitive way to learn it: just read the migrations like a book, front to back; watching the corresponding generated code evolve with each migration. If you're only interested in one component, then filter for just the migrations that make changes to it (`git log -S` style) and then read those. And if you, during development, realize you've just made a simple typo, or that you wrote some confusing code and later came up with a simpler way to express the same semantics — i.e. if you come up with code that "you wish you could have written from the start" — then you should go back and modify the earlier migration so that it's introduced there, so that new dev reading the code never has to see you introduce the bad version and then correct it, but instead just sees the good version from the start.

In other words: don't think of it as "meticulously grooming a commit history"; instead, think of it as your actual job not being "developer", but rather, as you (and all your coworkers) being the writers and editors of a programming textbook about the process of building program X... which happens to compile to program X.

I don't understand the focus on commit messages. I never read the git log. You can't assume anyone reading the code has access to the commit history or the time to read it. The codebase itself should contain any important documentation.
> I never read the git log.

Well, no, because 1. it's not useful, because 2. most people never write anything useful there (which is a two-part vicious cycle), and 3. editors don't usefully surface it.

If we fix #3; and then individual projects fix #2 for themselves with contribution policies that enforce writing good commit messages; then #1 will no longer be true.

> You can't assume anyone reading the code has access to the commit history or the time to read it.

You can if you're the project's core maintainer/architect/whoever decides how people contribute to your software (in a private IT company, the CTO, I suppose.) You get to decide how to onboard people onto your project. And you get to decide what balance there will be between the amount of time new devs waste learning an impenetrable codebase, vs. the amount of time existing devs "waste" making the codebase more lucid by explaining their changes.

> The codebase itself should contain any important documentation.

My entire point is that commit messages are part of "the codebase" — that "the codebase" is the SCM repo, not some particular denuded snapshot of an SCM checkout. And that both humans and software should take more advantage of — even rely upon! — that fact.

> I never read the git log.

Imho, you are missing out on a great source of insight. When I want to understand some piece of code, I usually start navigating the git log from git blame. Even just one-line commit messages that refer to a ticket can help understanding tremendously. Even the output of git blame itself is helpful. You can see which lines changed together in which order. You see, which colleague to ask questions.

One could accomplish a part of your vision, today, by splitting commits into smaller commits. Then you have the relation between hunks of changes and text in the commit message on a smaller level. Then you can use branches with a non-trivial commit message in the merge commit to document the whole set of commits.

As far as I know changes to the Linux kernel are usally submitted as a series of patches, i.e. a sequence of commits. I.e. a branch, although it is usually not represented as git branch while submitting.

I disagree.

WHY: Using _______ sort because at time of writing code, the anticipated sets are ... and given this programming language and environment ... this tends to be more performant (or this code was easiest and quickest to write and understand).

This way when someone later come along and says WTF?! They know why or at least some of the original developers reasoning for choosing that code implementation.

Completely agree. I find this attitude toward (against?) comments is common in fields where code churn is the norm and time spent at a company rarely exceeds a year or two. When you have to work on something that lasts more than a few seasons, or are providing an API, comments are gold.
Concurrence: the ability of the code to be self-documenting ends at the borders of the code itself. Factors outside of the code that impose requirements on the code must be documented more conventionally. The "sleep 10 seconds to (hopefully) ensure that a file has finished downloading" anecdote from upthread is a great example.
I indeed enjoyed.
Self-documenting code is perfectly capable of expressing the "why" in addition to the "what". It's just that often the extra effort and/or complexity required to express the "why" through code is not worth it when a simple comment would suffice.
> Self-documenting code is perfectly capable of expressing the "why" in addition to the "what".

I don't think anymore that's true, at least in a number of areas.

In another life, I've worked on concurrent data structures in java and/or lock-free java code I'd at this point call too arcane to ever write. The code ended up looking deceptively simple and it was the correct, minimal set of code to write. I don't see any way to express the correctness reasoning for these parts of code in code.

And now I'm dealing with configuration management. Code managing applications and systems. Some of these applications just resist any obvious or intuitive approach, and some others exhibit destructive or catastrophic behavior if approached with an intuitive approach. Again, how to do this in code? The working code is the smallest, most robust set of code I can setup to work around the madness. But why this is the least horrible way to approach this, I cannot express that in code. I can express this in comments.

This is a statement which is technically-true (so long as your language of choice has no length on names), but unhelpful since it does not apply in most practical cases.
I like to make sure the "why" is documented, but it's hard to get people to care about that.

I remember a former client tracking me down to ask about a bug that they had struggled to fix for months. There was a comment that I'd left 10 years earlier saying that while the logic was confusing, there was a good reason it was like that. Another developer had come along and commented out the line of code, leaving a comment saying that it was confusing!

Hah, I did one of these just last week. There's some sort of silicon bug or incorrect documentation that causes this lithium battery charger to read the charge current at half of what it should be. This could cause the battery to literally explode, so I left a big comment with lots of warnings explaining why I'm sending "incorrect" config values to the charge controller.

It's absolutely imperative that the next guy knows what the fuck I'm doing by tampering with safety limits.

I like to add the date the why comment was added and the date the comment was last reviewed/veified as still true/neccesary (which will rarely differ because they are seldomly reviewed/re-verified).

COMMENT WRITTEN: 2023-03-21

COMMENT LAST REVIEWED/VERIFIED AS STILL TRUE: 2023-05-04

WHY THIS CODE: This sucks but ...

There was some inscrutable code that had sleep 10 [seconds] in the middle.

It took several hours to figure it out, but the sleep was there in case a file had not finished downloading

I see your "wait for a file to finish downloading", and raise you a "wait before responding because, for one of our clients, if the latency is below a certain threshold, it will assume that the response was a failure and will discard it". That was a fun codebase.
Back in the 90s, we were trying to debug a crash on a customer’s site (using the then miraculous pcAnywhere). We couldn’t figure it out, and in desperation sent a developer that lived in the same country to observe ‘live’. As soon as he watched us reproduce it, he said “the modem was still disconnecting” - he could hear it, and we couldn’t. The solution, of course, was a sleep statement.
shudder
I like to write frequent comments, so that I can just scan the comments (only) of a function to know how and why it does what it does.
The last part is also useful. Tells the next person where to look when they do run into scaling issues, and also tells them that there wasn't some other reason to do it.