Hacker News new | ask | show | jobs
by gregn610 2892 days ago
A killer feature of TOML compared to JSON is that it allows comments. A config file without comments and examples ain't great. I found the double square bracket syntax useful and understandable. Agreed it's not obviously .INI or perfectly elegant but it certainly works and has its use-cases. Anyways, thank you @mojombo!
6 comments

JSON used to support comments but they started being used for unintended things like parsing directives.[0]

It is apparently possible to include, and then remove them before usage apparently but I've never tried.

[0]: https://plus.google.com/+DouglasCrockfordEsq/posts/RK8qyGVaG...

I'm personally doing something similar in a C++ project that I'm working on. I designate comments with a //. When I load the JSON file, I go line by line removing comments as I find them. Once the file is parsed you can feed it into any regular JSON parser (personally I'm a fan of https://github.com/nlohmann/json).

Using JSMin would have been a much better idea. At this point though the only downside is that I haven't yet done this for saving JSON. It's a harder problem but I don't see why it wouldn't also be doable.

edit: Apparently this library allows you to have comments, and will even preserve them when saving the JSON file.

https://github.com/open-source-parsers/jsoncpp

You could use libjsonnet++ instead of a custom reader to gain the extension of using comments. Of course, Jsonnet is a gateway drug to a much more sophisticated superset of JSON.
There's the JSON5 format that supports comments, there are libraries available for parsing and encoding it.
You're welcome, really glad you like it, and good to know the double square brackets make sense to you! It does seem to be a bit polarizing, so we'll keep working on something even better.
> I found the double square bracket syntax useful and understandable

Really? I had kind of the inverse reaction to it. Indeed useful, but definitely not understandable

It's kind of similar to the syntax of uri query strings and the php syntax for appending to arrays, in that an extra pair of square brackets indicates appending:

In php:

    $a[] = 1;
    $a[] = 2;
    # $a == array(1, 2)
In URIs:

    http://example.org/?a[]=1&b[]=2
Compare to TOML

    [[a]]
    item = 1

    [[a]]
    item = 2
That URI syntax is just a convention used by some web frameworks. You can just as well write

  https://example.org?foo=bar&foo=baz
Not for PHP at least, maybe others, the last query arg will win.
{ “_comment”: “The comment goes here!”, “name”: “foo” }
So now your comments are flowing through infrastructure wasting bandwidth and possibly leaking internal configuration details?
Parse private properties in the config files in your build process, if paranoid. I don’t have use cases where I’m sending the full contents of a config across the network. Each property in a config would be explicitly referenced in the application.

And it’s not like there isn’t any potential problems with some YAML or TOML library you import. Rails Yaml comes to mind.

And maybe even breaking code which doesn't what to do with these fields.
Json allows comments, but they have to be in string fields of an object so they survive reserialization.
So then… not actually comments at all?

A nice config parser will error out or at least warn loudly when encountering an unknown parameter. This helps prevent typos.

I used to feel this way, and also used to be frustrated about multi-line strings in JSON. With years of experience now, though, I actually appreciate JSON omitting these features.

Config files should absolutely not have or need comments. If you need them directly in the config file, something is wrong. Applications should document their default settings in a different way, preferably in a README or generated documentation that also explains how to use environment variables to override the defaults. That sort of separate companion doc is the right place for notes about defaults or "why" certain config values exist in the file. The same is true for using JSON to store parameter files, etc. It's actually quite important to keep metadata about the config / params / etc. specifically out of those files, so that they are absolutely nothing but value files. Information about why a file contains those values belongs elsewhere, and it's an anti-pattern IMO to rely on comments in the config / param file.

I have never disagreed with someone more than I do now. =)

Config absolutely needs comments. Context is everything. Comments allow me to explain to other humans why the config is the way it is. Dumping that out to a separate file is begging for it to fall out of sync when there's no comment instructing anyone to go and update the other file. Plus that's just kind of silly.

I would even say config needs comments more than code does. Code can be self-documenting: by using good variable and function names, splitting or combining lines of code, or re-ordering blocks of code you can often make the intent of the code clearer without adding explicit comments. If you do something unexpected in a config file, it likely just shows as setting some name to a magic number or a magic string.
This ... so many things about configuration decisions that make sense at a point in time but can change with versions of a library, OS, server etc.

In past lives I have had to hack around so many different things to make something work they way I intended that I knew there would likely be a better solution to at some point.

As we get closer to things like infrastructure as code and configuration as code being the norm I would like to take this comment to remind people there is no such thing as self commenting code!

Even if this configuration should be “obvious” given constraints today when someone comes back to this months or years from now it’s likely some of those constraints could have changed or been removed completely - ignoring this is how you end up not changing things out of fear that something will break without real understanding.

Comments are almost never a problem unless they’re not updated when significant changes are made

I definitely add way more comments to my projects' default config files than I do to my actual codebase, if we're talking total number of comments.
I disagree. It’s an anti-pattern. For example, if you’re writing an application that loads a default config file to populate parameters at run time, then the software module that loads from the default file is the correct place to document it, because the meaning of defaults is relevant to that source code, not at all to someone reading the parameter file itself. A parameter file is just some blob of stuff.

I agree context is everything, and that’s why it’s a bad idea to embed usage info or instructions about the contents or meaning of a parameter file into that very file.

Somewhere else, something has to choose to load that file, and that is where the documentation belongs (in addition to readable, separate artifacts that are generated from the file).

For example, suppose you need to rewrite the parameter file from YAML to Toml, or you need to add a new layer of nesting and some post-processing logic at load time.

The meaning of these things has no context inside the parameter file itself. It only has meaning at the point some other system consumes it. Another system could consume the exact same file and choose to interpret all the parameters with different meanings in that program, regardless of what any comments says in the param file.

Maybe that makes sense for the app you are writing that is only consumed by others at your company, but if I'm installing your app from a package manager I absolutely do not want to have to read the config loading source code to figure out what all the parameters do. And even if the docs are nicely described in a man page and not in config file comments, I absolutely want to be able to comment on particular parameter changes ("# had to up foo to 18 because bar was frobbing baz at to high a rate").

Even if you use configuration management it's still nice to be able to comment things in config files since they show up both in the source and in the output files, which is very helpful when debugging.

I specifically meant that comments aren’t a good idea in config files when the config files are distributed to any end users as part of some app or package installation.

> “And even if the docs are nicely described in a man page and not in config file comments, I absolutely want to be able to comment on particular parameter changes ("# had to up foo to 18 because bar was frobbing baz at to high a rate").”

This is the exact anti-pattern that happens as a result of relying on comments in the config file. Your goal of adding that comment about why you changed foo directly in the config file is dangerous and is a very bad practice.

Instead, whatever config file it is that you are modifying (whether for third parties to consume or just for own local Postgres or video game or anything), that file should be turned into a proper package. Place it in a version control repo, and readme and usage documentation for the “why” of the parameter values, and make a tool so that if the end user wants that exact config file, they can “install” it.

For any customized overrides of single settings at run time (so specifically not changes to a config file), it should happen via the end user overriding ENV variables, not mucking around in config files and trusting ad hoc comments found inside them.

>I agree context is everything, and that’s why it’s a bad idea to embed usage info or instructions about the contents or meaning of a parameter file into that very file.

If you don't embed those comments you bereft that file of context -- why, as you admitted "is everything".

>Somewhere else, something has to choose to load that file, and that is where the documentation belongs (in addition to readable, separate artifacts that are generated from the file).

Well, that's not really relevant. People who change configuration are 99% of the time not the same as those who write/read/or even have access to the code that reads it.

Imagine Apache or Postgresql configuration for example.

Admins change those all the time, but don't have anything to do with the Apache or Postgresql project, and don't ever care to read their code.

On top of that, my particular settings in some config file, are based on MY local server and needs and my context, and not on something Postgresql or Apache devs will know themselves.

System admins who might change e.g. Postgres config files absolutely should be proficient in looking at documentation or the source code to understand the meaning, and then create separate documentation about their own customized config files (not making others or their future self actually have to read that file to understand why a value was chosen).

But more generally, it’s bad that a lot of applications don’t offer documented ways to modify config through ENV variables.

A config file should always be code reviewed, versioned and checked into SCM. One side effect is that end users, sys admins, etc., should never be given the chance to inject their customizations through locally modifying the base config file. That should be straight disallowed, so that distributing config files (such as the default, or bundles of other settings commonly used in unison) is part of packaging and deployment, and end users use other mechanisms that allow overriding defaults in a case by case manner.

Then you are talking about perhaps a shell script that sets dozens of ENV vars to override what comes from a (never modifiable) config file, and sure you might document the why of your choices in that shell script with comments, and at no point would anyone need or want comments in the actual config file(s).

Suggesting that most developers who use postgres should be familiar with its source code is madness.

Expand this to everything a sysadmin has to manage it just doesn't scale.

I very much disagreed with you two comments up, but this more fleshed out response actually makes a lot of sense. You just moved me from "I'm going to add a build rule that strips comments from all my JSON files so that I can have comments in my configs" to "The next time I want comments in JSON, I'm going to see if what mlthoughts2018 said makes sense in this case. Do I really need to comment this JSON file, or would it be better to put the comments somewhere else instead?"

We'll see how it goes!

Cheers

You're talking about comments that document the semantics of fields in a config file. Most other people here are talking about comments that describe why the _specific values_ present in a _specific file_ were chosen.

Whether documenting semantics of fields belongs in a config file or code (I say both!), it simply doesn't make any sense to say that comments about why specific values in specific files are the way they are belong in the code that parses them. (How can the parsing code have knowledge of all config files out there?)

No, I am talking specifically about comments or documents that explain the "why" behind parameter choices. For canonical defaults that are shipped with an app, this should be documented in a user guide and in the appropriate parts of the source code, not inside the config file (which most users would never care to read, and which would be worse to read for most developers who might care or change it).

For local config files that are not shipped with the package, they should still be source controlled in a separate repo (yes, even if you personally are the only one using it, like your local Postgres configuration or something), and you should use a good practice to 'deploy' any changes to your config from the repo into the actual location where the application can recognize that config -- meaning that documentation about the "why" of the parameters once again should absolutely not be embedded inside the config file itself.

If they're not in the config file, where exactly are they then?

A text file sitting next to the config file? I can't see any benefit to that arrangement over just using comments in the file.

Comments that exist in the file in the repo but get stripped out by the deploy process? Again, there seems to be no point to doing this.

Commit log messages? I agree commit logs can sometimes be valuable to see the context of a change, but 1) they're fundamentally about documenting _changes_, not the contents themselves, and 2) the UI is really clunky: git blame, find the line you want context for, then git show on that commit.

You make some interesting assertions, but 1) they contradict everything I have learned from painful experience and 2) they seem to make no sense.

> It's actually quite important to keep metadata about the config / params / etc. specifically out of those files, so that they are absolutely nothing but value files. Information about why a file contains those values belongs elsewhere

Okay, I'll bite: Why? In every area of programming, we learn that mental context changes are harmful. We learn to avoid gotos, not abuse exceptions, add meaningful comments, and seperate concerns all in large part so when we look at a file, we can understand what's going on without referencing other files.

Now you say "oh, unless there are some config values in that file, then it's important for it to be as cryptic as possible!" Surely you see why that sounds a bit odd? Should we also base64 encode the file? Do you also hate descriptive variable names in config files?

>Config files should absolutely not have or need comments.

Well, that's like your opinion, man.

If I have something set to some value in my configuration, I want those reading the configuration file (not parsing it, reading it in a text editor, e.g. to make an edit) to know why it's so.

>Applications should document their default settings in a different way

Comments in configuration files are not there to explain default settings, but to document why a setting (default or not, but usually already edit to suit your specific environment) has the value you set it to.

>Information about why a file contains those values belongs elsewhere, and it's an anti-pattern IMO to rely on comments in the config / param file.

That's a statement, not an argument. Why does it "belong elsewhere"? So that you have an extra layer, that few will bother to check?

> I actually appreciate JSON omitting these features.

I guess this is right if config files are only machine read and written. When I am prototyping, config auto-gen tends to come pretty late in the project (for me), and I like being able to comment things.

I also like types and time-date parameters (which JSON doesn't provide unambiguously), so by the time I've done, I've reinvented YAML+, and that's a mess, so I actually appreciate TOML quite a bit... (At the very least - I get to be a retard[0] about config later in the process)

[0] To the inevitable person that's going to call me an able-ist for using the term, I'm using the term in a self-denigrating way, ...

> "I guess this is right if config files are only machine read and written."

No, I meant omitting these features is very useful for humans reading and writing config files. Comments utterly don't belong in config files. They belong in the sections of code that load specific config files and convert their contents into defaults or parameters.

A config file is just some file. Its contents have no conventional meaning. It only takes on a meaning in the context of the specific system that uses it.

Hugely disagree. Config files will modified by non-experts of the application 1000x as often as the actual developer of the application. Those people won't and often can't go look at the code.

Also, it can be really hard to find where exactly a configuration value is used. You may have to trace through a ton of code to find the place, and then you can't be sure that's the only place it's used.

Configuration comments are crucial, both for onboarding new users (explaining what the default is if you don't set a value, explaining what the configuration value actually controls, etc) and for experienced users to tell others why this esoteric configuration is set the way it is.

A configuration file is literally just a stack of magic numbers and strings. Why are we setting threadpool to 10 and not 1000? Why are we disabling X feature? What the heck does TPS_Report=true do?

You need comments.

I disagree strongly. Config files being modified by anyone should be going through code review. The risk of not understanding while at the same time modifying things is extremely low.

Plus, the documentation in the application code that loads the config and manipulates would function as the exact same reference documentation for any developer trying to understand how the config is used or why a choice is made.

This prevents the documentation about what the config is supposed to be used for from being coupled with the implementation detail of what particular config file looks like, what language it’s written in, etc.

Just as you say, a config file is a stack of magic constants. They have no meaning at all sitting in that file. The place to look for their meaning is the documentation of the code that loads the file, which should tell the user everything they need to know about modifying or providing their own file.

IF you have a 500k LOC software project... how the heck is an SRE/devops person going to figure out where in that code a specific configuration item is going to be used? They're not. This is why documentation is essential for projects. You could keep configuration documentation in a separate file... but that only helps for what the config does. It can't help you figure out why Bill (who left the company a while back) set ThreadMax to 650 when he changed the code 6 months ago. There cold be a commit message that references it, but that's more disconnected from the change that just slapping a comment on top that says why.

I agree that code review for configuration changes is necessary. That same code review process can ensure that the comments in the config file are also correct.

> Config files being modified by anyone should be going through code review.

Yeah, I'll put in a PR for my local Transmission config file and see how far that gets me ;)

Config files should be understandable and readable by the end users so they can customize their local installs.

>They belong in the sections of code that load specific config files and convert their contents into defaults or parameters.

And what are users who don't have access to the source, or who aren't programmers, supposed to do?

Somehow these people are reading Toml files of config? That’s silly.
Do you expect a systems administrator to be familiar with the source of every software he maintains?
Non-open source software.

Ops teams deploying a service.

Programmers who don't know the language a program is written in.

Heck what about a config file for a game? If setting the resolution it'd be nice if valid values are shown in the comments.

The argument is about comments in JSON btw.

That might be correct. I tend to want to localize explanation at the edge. I've got to think about this. Thank you.