Hacker News new | ask | show | jobs
by jethro_tell 781 days ago
One of the problems that the yaml interpreter class of languages, or whatever you'd call them, suffer from is the fact that yaml itself is a language and tends to be more or less undocumented in the interpreter docs.

It's sort of assumed that you are going to do extremely simple tasks on very flat data structures. That doesn't tend to be the reality that most of us live in. And to really get the most out of these languages you have to understand an entire unspoken set of rules on how to use yaml. That's never really pointed out in the docs.

Additionally, there are docs for the unique settings for each module but as far as using the standard settings, additionally, its rarely clear how to operate on the data that might be returned or combined with anything mildly complex, you are given a dozen 1 stanza examples for each item like a stack of ingredients and then told to bake a cake.

I've had this experience with basically every one of the various yaml interpreter systems I've used.

After a few 100k lines of yaml I can get things done but the docs are useless other than a listing of settings.

3 comments

To illustrate this point, here is how to have a multi line value in yaml: just kidding, it’s so confusing that there is a whole website to help you figure it out: https://yaml-multiline.info/
Those examples don’t look confusing except there being more than one way.
Good luck remembering which is which :)
Isn’t it why toml is seemingly increasingly used to replace yaml in projects?
In my experience toml is worse at anything complex. It's nice as an .ini replacement but makes even yaml look sane in comparison if you want to use it for very complex or deeply nested stuff. But it wasn't designed to do that anyways
Am I alone in greatly preferring nesting in toml compared with yaml?
Did you try to convert any mid-complexity ansible role into toml? It was very interesting exercise for me and vastly conclusive.
I hope not, toml is even worse at complex things and just slightly better at the stuff that isn't confusing. Add a k:v to a mildly complex dict.

At this point, I'm pushing into a place where I'm just going to switch to go because its getting to be a mess.

It’s insanely better at config.

It’s about as bad at being a programming language or data structure serialization format, though.

But yaml is fine at config, it sucks at looping, conditionals and data structures, if you aren't fixing that its just another standard we have to learn, so thanks for that
No, it's not fine at config. It's a bag of weird corner cases and incompatible revisions which you have to debug instead of doing your actual work.

Here's a good summary: https://ruudvanasseldonk.com/2023/01/11/the-yaml-document-fr...

Nope it isn't.

There are so many things that aren't expressible in TOML that any anywhat complex system will want... it's not even a contender.

So, one problem a lot of configurations are trying to solve: modularity. I.e. how to allow different actors to change the parts of the configuration they want. Everything under /etc nowadays is of a form /etc/*.d/*.* that is all configurations are directories of multiple files with some ridiculous rules (like "prefix file names with digits so that they sort and apply in the "right" order etc.) XML had a better approach with namespaces and schema, but maybe not perfect.

Polymorphism. Any non-trivial configuration system will have plenty of repeating parts. NetworkManager connection configurations? -- They are all derived from the same "template". Systemd device services -- same thing, they are all coming from the same "template". There are plenty more examples of this. But, languages like YAML or TOML don't have a concept of polymorphism in them. This is never encoded in the configuration itself. Instead, every tool that needs to be configured and needs some degree of polymorphism rolls its own version.

Constraints. It's often impossible to describe the desired configuration through specifying the exact values. Often the goal can be described as "less than" or "given the value of X, Y should be multiple of X" and so on. Such concepts, are, again, not expressible in TOML or YAML and friends.

NB. Types are a kind of constraints.

Identity. It's often necessary in configuration to distinguish between two sub-sections that look the same and two sub-sections that designate the same exact object. Like, when configuring VMs with eg. disks: are they supposed to mount the same disk, or does each VM need a separate disk, that just has the same physical characteristics?

But then, what is a config file if not a representation of a data structure?
With the restriction that it has to be represented in human-friendly text.

JSON’s a crazy-bad serialization format, too, for that matter. It doesn’t even know what a damn integer is.

Toml is great for simple use-cases. For complex ones you have the same problem that yaml has: Templating a language with significant whitespace via text substitution is a horrible horrible idea. Somehow this sad state of affairs has become industry standard.
Its not even the white space, a food liter or language server can handle that. It's not that, as much ad the fact that the most complex data structure is a list.

If you want to get crazy, you can push a dict into a list and operated on it but it gets tough at the second level. And don't get me started on if/else statements.

I honestly wonder why not just write your web server in node or something. It would be traceable and testable and probably performant enough. There's just so much arcana inside platforms like traefik or nginx where they do all this miraculous stuff if you just add the right flags, but also when it doesn't work it's a total black box and there's no way to discover what it thinks it's doing.
I like that "probably" for perfomant here, but let's focus on features so far.

Let's imagine we go this way, implemented own self made webserver in nodejs, started using it and the next day it will be required to add simple things like basic auth for specific location or ACL based on Maxmind geo data or even setting straightforward round-robin balancing among several php-fpm upstream, even without weights - what would be the flow here? Involving dev team and trying to put those tasks into their backlog?

Why it's better than just adjust Nginx config in 5-10 minutes?

The Traefik dashboard is pretty helpful for visualizing what's happening. Also their error message are usually pretty clear about what's wrong.