Hacker News new | ask | show | jobs
by fdsary 4220 days ago
Choosing a unified format for configurations is an interesting task, because they all suck a lot (hehe). XML is too verbose to be nice to work with. Plain text files with config flags delimited by newlines lead to the program in the end implementing a small scripting language for config files.

JSON is pretty nice, but also a bit clunky. A lot of {:} all the time.

Personally, I think the nicest and most expressive way is S-expressions. I'm no lisper, but you have to admit sexprs are expressive, easy to read, and can be run as functions if the program knows lisp.

    {  
    "configFiles": "in JSON",  
    "wouldLook": {"like":"this"}  
    }  

    (while sexpr
      (could look)
      (even nicer))
11 comments

Regarding the configuration formats, I would recommend TOML.

In fact, I hated it because it seemed to be just "another standard" that unnecessarily adds dependency.

But after using Rust, whose package manager forces me to write the package configuration in TOML, I found the format is more like "JSON designed for configuration file." As you said, JSON is full of ':', '{' and '}'. And it is natural because it started from data interchange format, not for handwriting. TOML solves this problem very well IMHO.

Also, unlike XML that requires an external structure to validate the types of the values, TOML values have types, just like JSON.

There's also http://p3rl.org/JSONY which is explicitly "JSON re-imagined for configuration files" by myself and ingy (the creator of YAML).

The key win for us is that all JSON is valid JSONY, so you can use the human friendly syntax for hand editing but still just dump out a JSON object into the file if you're generating the entire thing from code.

Can you give some examples of TOML-based Rust package configurations that you think are good? My experience using TOML for config files has been very different.

IMO, using YAML for the config format is still the best option for most use-cases at this point. (Easy to support both json and YAML.) But I do think that it would be great to have a YAML successor that standardizes some of the cool things people are doing with custom YAML parses, and also removes outdated and unnecessarily complex parts of the original standard.

Is there some reason for the commas in lists? I'm starting to like s-exprs more and more for configs and am starting to think it might be because of a lack of love for commas.
It just seems that TOML (and similar formats) would be more of an interface rather than the configuration itself.
NixOS got this right.

Declarative configuration is great in theory but it doesn't scale well. See Amazon's CloudFront configuration files for what configuration in JSON looks like. (For example, https://s3-us-west-2.amazonaws.com/cloudformation-templates-...) It quickly gets hard to manage the syntax and balance all the quotes, parentheses, braces and brackets, with careful use of whitespace being critical for even a hope of understanding it. When you add in semantics, it gets even worse. Take the "Ref" construct in CloudFront to work around the fact that JSON doesn't have a way to name something and refer to it elsewhere.

As the amount of configuration grows, you have to introduce abstractions to encapsulate common patterns, break out of deep nesting, name things etc. I guess it's possible to do that declaratively, but that leads to things like XSLT. (Or, open the CloudFormation JSON linked above and search for "Fn::". It makes me shudder).

S-expressions would indeed be better than JSON, but they still wouldn't scale without macros. So either all programs would have to know lisp, or you need a dedicated program for executing the configuration.

That's what NixOS has. It does all configuration via "nix expresssions" which are written in a lazy, functional language. The functional bit allows for the abstraction necessary to make large amounts of configuration readable and maintainable. The lazy bit makes executing large configurations efficient, given that you're usually only interested in certain parts of it at any one time.

The result is that you can write a configuration that describes a cluster of machines from top to bottom: from how many app servers to put behind the load-balancer to which version of zlib to link node.js against.

It takes a bit of effort to learn nix, and for many, the functional paradigm will be strange and scary. But's so worth it.

Could you give an example of a NixOS config file that contains as much complexity as the CloudFront JSON config file, but does so in a more readable way?
Take a look at GUIX (GNU Guile ~wrapper~ around Nix) http://www.gnu.org/software/guix/manual/guix.html#Programmin...

That said s-exp can be cryptic depending on how you design your DSL, but overall it's a 80% great to read and easy to process, plus chopping s-exp in s-exp aware editors (not pointing at some very old very large one here)

I've (purposely) ignored the config files and systemd wars.

But that means I am truly ignorant - what, in fact, is the big problem with plain old unix-style config files ?

Note that I am not asking about what is wrong with how they are scheduled and run - I get that controversy. But what is wrong with the files themselves ?

I know I have absolutely hated every single interaction I have had with plists and launchd and whatever else on OSX ...

There are two things which I've heard a lot:

1. Every program rolls their own format, so you end up with a bunch of custom rules for configuring each application on the system. This makes life tedious for anyone automating system management and, on a practical level, increases the chance of human error when someone forgets the precise ways in which their experience with something else isn't quite portable.

2. There are a wide range of supported features – e.g. interaction with environmental variables or external defines / defaults, ability to reference other config values, ability to either include a file or merge config parsed from a different location – and this leads to more tooling to paper over the gaps.

3. Hierarchy and discovery: in Windows or OS X, the preference system has the concept of user, system and domain config so you can set sane defaults at the domain level but a user can still override one of them just for their account. (There's another interesting tangent here where it's useful to have config supplied with a package which can be selectively overridden by a sysadmin without forking the entire file)

This works, at least for certain values of "works", because there's a single standardized API for working with the registry or plist files.

I generally dislike the details of plists but the fact that I can e.g. run "defaults read [scope] key value" and get the same effective config value which an application will use without knowing at which level it was last set or even whether the file was the older XML plist format or the newer binary one is really handy.

The fact that a config tool can simply run "defaults write [scope] key value" and not have to worry about those same mechanics is even nicer for sparing me time worrying about e.g. editing a text file safely, knowing whether I can just dump a new value at the end or must purge old values first (oops, FooD stops at first value while BarD uses the last and BazD does either depending on quirky rules you'll just have to learn…), etc.

Similarly, launchd has a regrettable config syntax but I've forgiven it at lot because it's a relief to use the same tool everywhere rather than needing the accumulation of different hacks and kludges which sysv init/cron or, absurdly, upstart still require in practice.

What would be really nice would be someone building a library with plenty of language which supported configuration based on the classic Unix config files but with a defined syntax and well-documented strategies for error handling and hierarchy and started working to help popular programs migrate over on the next backwards-incompatible release. There's certainly no technical reason why any of the good parts should require Apple's XML cruft or the accumulated bad culture around the Windows registry.

"3. Hierarchy and discovery: in Windows or OS X, the preference system has the concept of user, system and domain config so you can set sane defaults at the domain level but a user can still override one of them just for their account. (There's another interesting tangent here where it's useful to have config supplied with a package which can be selectively overridden by a sysadmin without forking the entire file)"

On *nix, system-wide config files are usually under /etc, and they can be often be overridden by dotfiles in a user's home directory.

One example of a way a user can override global/system configs without forking the entire file is using ~/.Xdefaults.

I hear you about there being no common standard, however. It would be nice if there was one good config file format to rule them all, that could please everyone.

But, since you can't please everyone, it's better to have choice than to have someone's (Microsoft's/Apple's/Ubuntu's) idea of "good" forced down everyone's throat.

>But, since you can't please everyone, it's better to have choice than to have someone's (Microsoft's/Apple's/Ubuntu's) idea of "good" forced down everyone's throat.

Sometimes it's more about not wanting to spend another 5h on config files for some deamon on every new system you have. Using Ubuntu's or Apple's defaults is nice when you want something that works and let's you start getting shit done now.

> plain old unix-style config files

Like what? /etc/X11/xorg.conf is a different format from /etc/openldap/ldap.conf is a different format from /etc/crontab is a different format from /etc/nsswitch.conf.

Unifying configuration files under a single common format would allow for less complexity in configuration management tools - you could just create a data structure and serialise it to a file, rather than dealing with text-based templates with countless edge cases.

And yet elsewhere, domain specific languages are preferred.

I'd love to see the One Config File Format To End Config File Formats that supports xorg, ldap, crontab, nsswitch, apache, and anything else. Sure, use something like JSON, XML, INI, or YAML for this, for which the format is regular and parsable with an off-the-shelf parser, but then the domain specific stuff just moves into, for example, the key names. You've gained nothing in terms of maintenance or automatic editablity, but you have gained consistent use of whitespace and punctuation characters.

"Creating a data structure and serializing it to a file" is doable text-based formats. Admittedly, templates are troublesome, but that's because you shouldn't be using templates anyway. Templates are an attempt to avoid understanding the domain specific language but still maintain the power of the domain specific language. I've never had good luck with this, which is why I mainly use configuration management tools, like puppet, to keep a fleet of machines in a consistent state by putting fully formed files in place. Templating takes significant care.

> And yet elsewhere, domain specific languages are preferred.

Yes, but that's because the DSLs actually provide something of value. Having umpteen different ways to store a list of values under a key, for example, doesn't solve anything.

> the domain specific stuff just moves into, for example, the key names

Anything that wants to manage the system already needs to know the key names, etc - implementing a single config file format is significantly reducing complexity because they no longer have to implement config file parsers/serialisers.

> I mainly use configuration management tools, like puppet, to keep a fleet of machines in a consistent state by putting fully formed files in place

These tools are currently primarily based on templating. If you ever have reason to write your own resources for them (if you have to edit a config file for something obscure or in-house), you're going to have to write a text-based template to convert from your nice Puppet data structure to the config file format - and probably get it wrong on edge cases.

Best i can tell, they don't play well with GUI config tools...
The YAML format mentioned in the slides actually has the least syntactic clutter, however the parsing is quite difficult
Well, that's why you use a library.

What's important is that it's easy to read and write. And that's yet another reason to standardize you language, so that you don't need to trade legibility over parseability, because you'll only need to write the parser once.

The only problem I see with that is that some programs use turing-complete configuration languages, while for most of them that'd be a bug, not a feature. One just can not standardize all *nix tools in one configuration language... maybe two, but not one.

Because it simply is too complex for a format that emphasizes easy readability. Some easy-to-grasp subset of YAML might be a good choice.
I came up with such a format (called BML) and for the same reasons. Here's an example that shows the entirety of the syntax:

    server
      path: /core/www/
      host: example.com
      port: 80
      service: true
      proxy
        host: proxy.example.com
        port: 8080
        authentication: plain
      description
        :Primary web-facing server
        :Provides commerce-related functionality
    
    server
      ...
      proxy host="proxy.example.com" port="8080"
        authentication: plain
Everything is a node, which can have a data value and zero or more child nodes. Nesting is determined by counting the number of indentions. It uses a counter so that if you want indents to be one tab, or two spaces, or four spaces, it will still work. (Being too rigid here makes the syntax very unfun for humans to write.)

Once parsed, element-style (first proxy node) or attribute-style (second proxy node) become identical nodes and are treated the same (but with a flag in case you want to write out a modified file): they are fully interchangeable, so no attribute vs element debates, just use what works best for readability. (This really is critical. Some document types would be ten times as long without attribute-style nodes.)

The syntax has no entities. foo="data" can capture any data that doesn't need quotes or newlines. foo: can capture any data that doesn't need a newline. foo\n:data\n:data can capture any data that doesn't need binary. foo\n:base64\n:base64 can capture absolutely anything.

Node names must be [A-Za-z0-9-.]{1,}, and are case-sensitive. The same node name can appear multiple times at any level, even at the root level. Ambiguity is resolved by the order of appearance for each node.

The data values of nodes are completely unparsed by the markup. The syntax knows no difference between strings, integers, floats, booleans, binary data, arrays, etc. The application parses the text however it wants. The library adds some convenience functions (.text() to strip surrounding whitespace, .integer() to get a number, .boolean() to decode true/yes/on vs false/no/off, etc.)

File format is mandatory UTF-8 (no BOM.) Preferred line feeds are '\n', but '\r\n' is also permitted because Microsoft.

The implementation in C is about 8KB. Since everything has a marker, allocations are not necessary for the node names/values (but you will need to allocate the tree structure, obviously.) There's an accompanying path query syntax (ala XPath) that's another 6KB of code or so.

(All the edge cases are well-defined (mismatched indentation, mixing multi-line and child nodes, etc), but the post is getting a bit long.)

This was the best I could do at minimalism. Removing any functionality it has results in ruling out many use cases.

Haven't seen that before; indeed it is similar. I quite like that.

The limitation I see is, how do you store data values with spaces in them? If it allows quoted values, then how do you store values with quotes in them? And how do you store line feeds? Not having those rules out a lot of use cases.

hmm, I can't remember about spaces and you would need them for txt records. I cand find anything in the documentation, i'd have to do some tests
I really like this. Wish it would be widely adopted in place of YAML.

Have you written a spec and promoted it at all?

Thanks! I did write up a spec (currently not online after a host move), but I'm very bad at expressing grammars with eg EBNF, so it was a fairly verbose read.

I haven't really promoted it, I don't know how to promote my work tactfully (I tried promoting a few things on r/programming and the mods buried them right off the bat.)

I'd be very happy for any help in this regard, as well as for any suggestions on simplifying the parsing (the edge cases really are annoying to deal with.)

You can bypass the issue by using the amazing Augeas project; a universal configuration API. http://augeas.net/

At the least a universal API provides a midpoint to moving to a universal format.

It really deserves more attention and use!

(One possible reason that it's not is that people might find it a little user-hostile at first.)

JSON is clunky, it's true, but it has advantages. It's very, very widely supported. There is no format versioning, there are no dialects, there is only One True JSON. It's incredibly minimal and supports no more, and no less, than it needs to.

It's not perfect, but it's a very good choice.

But JSON doesn't support comments which are pretty nice in a config file. You can do the ugly comment as property stuff, or strip them out, but you're getting away from the One True Format.
That's true. You could standardise on JSON with comments removed by JSMin... The underlying data would still be standard.
Consider Sendmail as a cautionary tale. One more than one occasion I have seen a busy sysadmin manually edit a .cf file and forget to backport the edit to the .mc file. The next time someone regenerates the .cf from the .mc you have a vanishing edit.

Yes, the sysadmin was clearly at fault. No, this is not a deal breaker, as seen by the longevity of Sendmail. But a config file that gets pushed through a preprocessor has a more complicated life cycle than a "static" equivalent.

If your organization has comprehensively embraced ansible/chef/puppet/etc this may be a non-issue, because all of your configs might be generated. Just food for thought.

> Yes, the sysadmin was clearly at fault. No, this is not a deal breaker, as seen by the longevity of Sendmail. But a config file that gets pushed through a preprocessor has a more complicated life cycle than a "static" equivalent.

Er, only if the application doesn't run the preprocessor itself.

OpenBSD solution has been to use "pf format" config files for as much as possible. They're all different, but immediately intelligible to anyone who's seen any of them before.
Would be even cooler if "Python-ized" (i.e. if spacing mattered):

    that is
      they could look  
      like this
Significant whitespace in configuration files... what could possibly go wrong.

Didn't we learn something from Makefiles here? Does that 'tab' look like 8 or 4 spaces to you?

This is mentioned a lot by Python newcomers but rarely turns out to be a problem in practice because, unlike GNU Make, Python simply requires you to be consistent throughout the same file.

That strikes me as an even more valuable property for configuration, which is more often accessed by multiple people and tools and more often as a distraction from other work rather than the primary focus.

> Python simply requires you to be consistent throughout the same file.

You or the other team of developers/maintainers/sys-admins looking over or patching this particular file. Using whatever default editor they prefer with whatever config an tabifying/untabifying behaviour that has.

Or maybe they are applying a patch produced on a different system with different tabulation? Or the other 2000 common ways for accuracy to go out the window.

> That strikes me as an even more valuable property for configuration

So you'd like to deliberately make the configuration system more brittle for absolutely no gain?

Yeah. Count me up for anything else. I personally prefer s-expressions, but I would be willing to go the full mile with pretty much any format NOT based on whitespace. Whitespace is bullshit and depending on it even more so.

If your team can't pick a single style and stick with, it is exceedingly unlikely that whitespace will be your biggest source of problems.
So your argument is that because you should be doing $X anyway, let's make the entire OS's configuration system needlessly brittle, just because you're a python dev and like python's style.

I don't think you understand the philosophy behind FreeBSD: it's about creating a predictable and reliable OS. This goes against those goals. Simple as that.

> > Python simply requires you to be consistent throughout the same file.

I like Python but this is not actually true. you only need to be consistent within a block, or be more deeply indented than the outer.

You can have different indentations for the if and else clauses!

Same that goes wrong in Python: Nothing. As long as tabs are outlawed, anyway.
I see many people continuing holy wars between tabs and spaces, even to a point to dismiss specific technology because of it.

If you simply enable option in your editor to make white spaces visible (i . e. you can tell the difference between tab and space), the problem becomes silly.

I personally like to use tabs for indentation, because of huge flexibility being able to change tab size (which changes indentation) as needed, but that just me.

It doesn't really matter which it is, as long as it's one of them. Spaces are more copy-paste friendly though.
> As long as tabs are outlawed, anyway.

When did that happen?

Tabs look like a light grey arrow in any editor I care to use.
That’s basically YAML (that they mentioned as a possibility). http://yaml.org/
They were suggesting an api for it as well. Who cares what the format is if you access it programmatically?
As much as people make fun of Java for putting abstractions over implementations this is a situation where I would love to see something like this. We can always switch to the hot new data format with something like this.
> Personally, I think the nicest and most expressive way is S-expressions.

Not only that. S-expr could be created automatically and run on the fly at runtime, new configurations could be added without restart of services, and Lisp has a well known powerful macro system. That way also "Python-ized" configs could be translated into s-expr easily. Such Lisp based configs could be run in a C embedded Lisp interpreter.

However, the FreeBSD team should at least seriously consider a new neutral logo. I wonder what BSD Unix has to do with the devil.

> However, the FreeBSD team should at least seriously consider a new neutral logo. I wonder what BSD Unix has to do with the devil.

It's not a devil, it's a Daemon. Like, Unix daemons (httpd, ftpd etc.)

An evil daemon is worse than the devil (sarcasm)

edit: paren'd

edit 2: been using bsd and linux for servers for the past 10 years, love both of them for different reasons.

This is the first time I've seen anyone comment on the FreeBSD logo like that. I thought it was hilarious

You "BSD devil" trolls are really not putting in much effort.
why, the goal and purpose of the FreeBSD Unix is to agitate 1600s-style cavemen of course.