Hacker News new | ask | show | jobs
by mlthoughts2018 2892 days ago
I used to feel this way, and also used to be frustrated about multi-line strings in JSON. With years of experience now, though, I actually appreciate JSON omitting these features.

Config files should absolutely not have or need comments. If you need them directly in the config file, something is wrong. Applications should document their default settings in a different way, preferably in a README or generated documentation that also explains how to use environment variables to override the defaults. That sort of separate companion doc is the right place for notes about defaults or "why" certain config values exist in the file. The same is true for using JSON to store parameter files, etc. It's actually quite important to keep metadata about the config / params / etc. specifically out of those files, so that they are absolutely nothing but value files. Information about why a file contains those values belongs elsewhere, and it's an anti-pattern IMO to rely on comments in the config / param file.

4 comments

I have never disagreed with someone more than I do now. =)

Config absolutely needs comments. Context is everything. Comments allow me to explain to other humans why the config is the way it is. Dumping that out to a separate file is begging for it to fall out of sync when there's no comment instructing anyone to go and update the other file. Plus that's just kind of silly.

I would even say config needs comments more than code does. Code can be self-documenting: by using good variable and function names, splitting or combining lines of code, or re-ordering blocks of code you can often make the intent of the code clearer without adding explicit comments. If you do something unexpected in a config file, it likely just shows as setting some name to a magic number or a magic string.
This ... so many things about configuration decisions that make sense at a point in time but can change with versions of a library, OS, server etc.

In past lives I have had to hack around so many different things to make something work they way I intended that I knew there would likely be a better solution to at some point.

As we get closer to things like infrastructure as code and configuration as code being the norm I would like to take this comment to remind people there is no such thing as self commenting code!

Even if this configuration should be “obvious” given constraints today when someone comes back to this months or years from now it’s likely some of those constraints could have changed or been removed completely - ignoring this is how you end up not changing things out of fear that something will break without real understanding.

Comments are almost never a problem unless they’re not updated when significant changes are made

I definitely add way more comments to my projects' default config files than I do to my actual codebase, if we're talking total number of comments.
I disagree. It’s an anti-pattern. For example, if you’re writing an application that loads a default config file to populate parameters at run time, then the software module that loads from the default file is the correct place to document it, because the meaning of defaults is relevant to that source code, not at all to someone reading the parameter file itself. A parameter file is just some blob of stuff.

I agree context is everything, and that’s why it’s a bad idea to embed usage info or instructions about the contents or meaning of a parameter file into that very file.

Somewhere else, something has to choose to load that file, and that is where the documentation belongs (in addition to readable, separate artifacts that are generated from the file).

For example, suppose you need to rewrite the parameter file from YAML to Toml, or you need to add a new layer of nesting and some post-processing logic at load time.

The meaning of these things has no context inside the parameter file itself. It only has meaning at the point some other system consumes it. Another system could consume the exact same file and choose to interpret all the parameters with different meanings in that program, regardless of what any comments says in the param file.

Maybe that makes sense for the app you are writing that is only consumed by others at your company, but if I'm installing your app from a package manager I absolutely do not want to have to read the config loading source code to figure out what all the parameters do. And even if the docs are nicely described in a man page and not in config file comments, I absolutely want to be able to comment on particular parameter changes ("# had to up foo to 18 because bar was frobbing baz at to high a rate").

Even if you use configuration management it's still nice to be able to comment things in config files since they show up both in the source and in the output files, which is very helpful when debugging.

I specifically meant that comments aren’t a good idea in config files when the config files are distributed to any end users as part of some app or package installation.

> “And even if the docs are nicely described in a man page and not in config file comments, I absolutely want to be able to comment on particular parameter changes ("# had to up foo to 18 because bar was frobbing baz at to high a rate").”

This is the exact anti-pattern that happens as a result of relying on comments in the config file. Your goal of adding that comment about why you changed foo directly in the config file is dangerous and is a very bad practice.

Instead, whatever config file it is that you are modifying (whether for third parties to consume or just for own local Postgres or video game or anything), that file should be turned into a proper package. Place it in a version control repo, and readme and usage documentation for the “why” of the parameter values, and make a tool so that if the end user wants that exact config file, they can “install” it.

For any customized overrides of single settings at run time (so specifically not changes to a config file), it should happen via the end user overriding ENV variables, not mucking around in config files and trusting ad hoc comments found inside them.

>I agree context is everything, and that’s why it’s a bad idea to embed usage info or instructions about the contents or meaning of a parameter file into that very file.

If you don't embed those comments you bereft that file of context -- why, as you admitted "is everything".

>Somewhere else, something has to choose to load that file, and that is where the documentation belongs (in addition to readable, separate artifacts that are generated from the file).

Well, that's not really relevant. People who change configuration are 99% of the time not the same as those who write/read/or even have access to the code that reads it.

Imagine Apache or Postgresql configuration for example.

Admins change those all the time, but don't have anything to do with the Apache or Postgresql project, and don't ever care to read their code.

On top of that, my particular settings in some config file, are based on MY local server and needs and my context, and not on something Postgresql or Apache devs will know themselves.

System admins who might change e.g. Postgres config files absolutely should be proficient in looking at documentation or the source code to understand the meaning, and then create separate documentation about their own customized config files (not making others or their future self actually have to read that file to understand why a value was chosen).

But more generally, it’s bad that a lot of applications don’t offer documented ways to modify config through ENV variables.

A config file should always be code reviewed, versioned and checked into SCM. One side effect is that end users, sys admins, etc., should never be given the chance to inject their customizations through locally modifying the base config file. That should be straight disallowed, so that distributing config files (such as the default, or bundles of other settings commonly used in unison) is part of packaging and deployment, and end users use other mechanisms that allow overriding defaults in a case by case manner.

Then you are talking about perhaps a shell script that sets dozens of ENV vars to override what comes from a (never modifiable) config file, and sure you might document the why of your choices in that shell script with comments, and at no point would anyone need or want comments in the actual config file(s).

Suggesting that most developers who use postgres should be familiar with its source code is madness.

Expand this to everything a sysadmin has to manage it just doesn't scale.

> “Suggesting that most developers who use postgres should be familiar with its source code is madness.”

This is a glib strawman that has nothing to do with what I said at all.

If you intend to modify config files directly, then you should be able to find the relevant documentation, source code comments, etc., on what the config entries mean or why a value is chosen as the default without needing comments in the config file. This is in no way related to your wild idea that “most developers who use postgres should be familiar with its source code..” (which is not at all what I said and is not at all a reasonable characterization. Reading a tiny bit of source code or docs to find one type of comments is utterly and completely different than your gross mischaracterization that it’s somehow a claim that people would need to be deeply familiar with a bunch of source code.)

I very much disagreed with you two comments up, but this more fleshed out response actually makes a lot of sense. You just moved me from "I'm going to add a build rule that strips comments from all my JSON files so that I can have comments in my configs" to "The next time I want comments in JSON, I'm going to see if what mlthoughts2018 said makes sense in this case. Do I really need to comment this JSON file, or would it be better to put the comments somewhere else instead?"

We'll see how it goes!

Cheers

You're talking about comments that document the semantics of fields in a config file. Most other people here are talking about comments that describe why the _specific values_ present in a _specific file_ were chosen.

Whether documenting semantics of fields belongs in a config file or code (I say both!), it simply doesn't make any sense to say that comments about why specific values in specific files are the way they are belong in the code that parses them. (How can the parsing code have knowledge of all config files out there?)

No, I am talking specifically about comments or documents that explain the "why" behind parameter choices. For canonical defaults that are shipped with an app, this should be documented in a user guide and in the appropriate parts of the source code, not inside the config file (which most users would never care to read, and which would be worse to read for most developers who might care or change it).

For local config files that are not shipped with the package, they should still be source controlled in a separate repo (yes, even if you personally are the only one using it, like your local Postgres configuration or something), and you should use a good practice to 'deploy' any changes to your config from the repo into the actual location where the application can recognize that config -- meaning that documentation about the "why" of the parameters once again should absolutely not be embedded inside the config file itself.

If they're not in the config file, where exactly are they then?

A text file sitting next to the config file? I can't see any benefit to that arrangement over just using comments in the file.

Comments that exist in the file in the repo but get stripped out by the deploy process? Again, there seems to be no point to doing this.

Commit log messages? I agree commit logs can sometimes be valuable to see the context of a change, but 1) they're fundamentally about documenting _changes_, not the contents themselves, and 2) the UI is really clunky: git blame, find the line you want context for, then git show on that commit.

You make some interesting assertions, but 1) they contradict everything I have learned from painful experience and 2) they seem to make no sense.

> It's actually quite important to keep metadata about the config / params / etc. specifically out of those files, so that they are absolutely nothing but value files. Information about why a file contains those values belongs elsewhere

Okay, I'll bite: Why? In every area of programming, we learn that mental context changes are harmful. We learn to avoid gotos, not abuse exceptions, add meaningful comments, and seperate concerns all in large part so when we look at a file, we can understand what's going on without referencing other files.

Now you say "oh, unless there are some config values in that file, then it's important for it to be as cryptic as possible!" Surely you see why that sounds a bit odd? Should we also base64 encode the file? Do you also hate descriptive variable names in config files?

>Config files should absolutely not have or need comments.

Well, that's like your opinion, man.

If I have something set to some value in my configuration, I want those reading the configuration file (not parsing it, reading it in a text editor, e.g. to make an edit) to know why it's so.

>Applications should document their default settings in a different way

Comments in configuration files are not there to explain default settings, but to document why a setting (default or not, but usually already edit to suit your specific environment) has the value you set it to.

>Information about why a file contains those values belongs elsewhere, and it's an anti-pattern IMO to rely on comments in the config / param file.

That's a statement, not an argument. Why does it "belong elsewhere"? So that you have an extra layer, that few will bother to check?

> I actually appreciate JSON omitting these features.

I guess this is right if config files are only machine read and written. When I am prototyping, config auto-gen tends to come pretty late in the project (for me), and I like being able to comment things.

I also like types and time-date parameters (which JSON doesn't provide unambiguously), so by the time I've done, I've reinvented YAML+, and that's a mess, so I actually appreciate TOML quite a bit... (At the very least - I get to be a retard[0] about config later in the process)

[0] To the inevitable person that's going to call me an able-ist for using the term, I'm using the term in a self-denigrating way, ...

> "I guess this is right if config files are only machine read and written."

No, I meant omitting these features is very useful for humans reading and writing config files. Comments utterly don't belong in config files. They belong in the sections of code that load specific config files and convert their contents into defaults or parameters.

A config file is just some file. Its contents have no conventional meaning. It only takes on a meaning in the context of the specific system that uses it.

Hugely disagree. Config files will modified by non-experts of the application 1000x as often as the actual developer of the application. Those people won't and often can't go look at the code.

Also, it can be really hard to find where exactly a configuration value is used. You may have to trace through a ton of code to find the place, and then you can't be sure that's the only place it's used.

Configuration comments are crucial, both for onboarding new users (explaining what the default is if you don't set a value, explaining what the configuration value actually controls, etc) and for experienced users to tell others why this esoteric configuration is set the way it is.

A configuration file is literally just a stack of magic numbers and strings. Why are we setting threadpool to 10 and not 1000? Why are we disabling X feature? What the heck does TPS_Report=true do?

You need comments.

I disagree strongly. Config files being modified by anyone should be going through code review. The risk of not understanding while at the same time modifying things is extremely low.

Plus, the documentation in the application code that loads the config and manipulates would function as the exact same reference documentation for any developer trying to understand how the config is used or why a choice is made.

This prevents the documentation about what the config is supposed to be used for from being coupled with the implementation detail of what particular config file looks like, what language it’s written in, etc.

Just as you say, a config file is a stack of magic constants. They have no meaning at all sitting in that file. The place to look for their meaning is the documentation of the code that loads the file, which should tell the user everything they need to know about modifying or providing their own file.

IF you have a 500k LOC software project... how the heck is an SRE/devops person going to figure out where in that code a specific configuration item is going to be used? They're not. This is why documentation is essential for projects. You could keep configuration documentation in a separate file... but that only helps for what the config does. It can't help you figure out why Bill (who left the company a while back) set ThreadMax to 650 when he changed the code 6 months ago. There cold be a commit message that references it, but that's more disconnected from the change that just slapping a comment on top that says why.

I agree that code review for configuration changes is necessary. That same code review process can ensure that the comments in the config file are also correct.

> It can't help you figure out why Bill (who left the company a while back) set ThreadMax to 650 when he changed the code 6 months ago. There cold be a commit message that references it, but that's more disconnected from the change that just slapping a comment on top that says why.

Also worth noting is that not all config files are committed to version control as-is. If a deployment process bakes the config file from variables, it can be even more disconnected and difficult to find the change.

The size of the project is a red herring in your comment. People document command line options, ENV settings, etc., in huge projects all the time. It has absolutely no bearing on whether comments belong inside of config files.
> Config files being modified by anyone should be going through code review.

Yeah, I'll put in a PR for my local Transmission config file and see how far that gets me ;)

Config files should be understandable and readable by the end users so they can customize their local installs.

This is silly. If an app like Transmission expects end users to modify a config file locally as the means to add customized settings, that’s a seriously bad design. Why not provide documentation about command-line arguments or ENV variables it would look for for end user customizeability. A settings file where you need to know the meaning as you read the file is among the worst ways to solve it.
>They belong in the sections of code that load specific config files and convert their contents into defaults or parameters.

And what are users who don't have access to the source, or who aren't programmers, supposed to do?

Somehow these people are reading Toml files of config? That’s silly.
Do you expect a systems administrator to be familiar with the source of every software he maintains?
No, I expect the system administrator to refer to a readme, user guide or API doc that explains how to inject custom options at the command line or through a web API, etc., and absolutely never by mutating a config file outside of version control with code review from the team that maintains that specific type of config.
Non-open source software.

Ops teams deploying a service.

Programmers who don't know the language a program is written in.

Heck what about a config file for a game? If setting the resolution it'd be nice if valid values are shown in the comments.

The argument is about comments in JSON btw.

Why would these people be modifying config files outside of code review? That’s horrible if true, and would entirely go against most best practices (e.g. 12 Factor use of ENV vars).

If you want to provide config customizeability as part of an API or interface to third party users (and you should!) then doing it by comments in a file that users modify is insanely bad. Instead, document usage instructions for overriding defaults with ENV vars — customization should never involve mangling a config file outside of version control, and absolutely not by third party devs, system administrators, etc.

In fact, I think your response highlights exactly why relying on comments in config files is such a bad anti-pattern.

That might be correct. I tend to want to localize explanation at the edge. I've got to think about this. Thank you.