Hacker News new | ask | show | jobs
by lucideer 264 days ago
I feel like these two tenets - (1) yaml should require quotes & (2) the value in yaml is in recursion/anchors - are fundamentally the opposite of why yaml exists & why people use it.

The distinguishing draw of yaml is largely the "easiness" of not having explicit opening or - more importantly - closing delimeters. This is done using a combination of white-space delimiting for structure, & heuristic parsing for values. The latter is fundamentally flawed, but yaml fans think the flaws are a worthwhile trade-off. If you're going to bring delimiters in as a requirement, imho yaml loses its raison d'être.

Recursion/anchors/etc. on the other hand are optional extras that few use & some parsers don't even support. If they were the driving value of yaml they'd be more ubiquitous.

Disclaimer: I hate yaml & wish it didn't exist, but I do understand why it does & I frankly don't have a great suggestion for alternatives that would fill those needs. Toml is also flawed.

4 comments

Genuinely curious - What major flaws does TOML have? I've used it before and it seems like a simple no-nonsense config language. Plenty of blog articles about the flaws behind YAML, I don't really see complaints about TOML!
INI-like formats are perfectly fine for config files with at most one layer of nesting/sections. TOML is a perfectly fine INI-like parser. Its definitions and support for strings, numbers, comments, sections and simple arrays are great. But its main claim to fame is extending INI to support arbitrary levels nesting of arrays and dictionaries like JSON, and IMO it does a horrible job at it.

With JSON, YAML, XML and many other formats, the syntax for nesting has a visual appearance that matches the logical nesting. TOML does not. You have to maintain a mental model of the data structure, and slot the flat syntax into that structure.

Furthermore, there are multiple ways to express the same thing like

  [fruit.apple]
  color = "red"
or

  [fruit]
  apple.color = "red"
It isn't always obvious which approach is more appropriate for a task, and mixing them creates a big mess.

And the more nested the format becomes, with arrays of dicts, or dicts of arrays, the harder it is to follow.

> And the more nested the format becomes, with arrays of dicts, or dicts of arrays, the harder it is to follow.

While I have some minor annoyances with TOML, I counterintuitively consider it a strength of the format that nesting quickly becomes untenable, because it produces pressure on the designers of config file schemas to keep nesting to a minimum.

Maybe some projects have a legitimate need for something more complex, but IMO config files are at their best when they're just key-value pairs organized into sections.

As far as I can see, nobody originally constrained the problem to config files. So I guess the problem with TOML is that it's only good for config files, while JSON and TOML are general purpose.
Yes, I think that's a fair characterization. The priorities of config file formats are different than the priorities of human-readable arbitrary data serialization and transmission formats.
TOML is basically a formalization of the old INI format, which only existed in ad-hoc implementations. It's not really a "language", just a config data syntax like JSON. It doesn't have major footguns because it doesn't have a lot of surface area.

The various features it has for nesting and arrays make it convenient to write, but can make it harder to read. There is no canonical serialization of a TOML document, as far as I can tell, you could do it any number of ways.

So while TOML has its use for small config files you edit by hand, it doesn't really make sense for interchange, and it doesn't see much use outside of Rust afaik.

I believe TOML can always be serialized to JSON. And TOML is in the python standard library in newer pythons. It’s also used as the suggest format for `pyproject.toml` in python
> I don't really see complaints about TOML!

Sampling bias, there are no complaints about it because no-one uses it (jk).

It's subjective of course but despite the name TOML never really seemed that 'Obvious' to me, in particular the spec for tables. I also think the leniency in the syntax isn't necessarily a good feature and serves to make it less 'Minimal' than its name suggests.

toml is just not human friendly unless you're just using a super simple object with as little nesting as possible. As soon as you increase the nesting you need yaml or json
> The distinguishing draw of yaml is largely the "easiness" of not having explicit opening or - more importantly - closing delimeters.

Along with a coworker, I wrote the package manager for Dart, which uses YAML for its main manifest file (pubspec.yaml). The lack of delimiters is kind of nice but wasn't instrumental in the choice to use YAML.

It's because JSON doesn't have comments.

If there was a JSON+comments what was specified and widely compatible, we would have used that. YAML really is a brittle nightmare, and the lack of delimiters cause problems as often as they solve them. We wrote a YAML parser from scratch and I still get the indentation on lists wrong sometimes.

But YAML lets you actually, you know, comment out a line of text in it temporarily, and that's really fucking handy. I think of Crockford had left comments in JSON, YAML would be dead.

JSONC is JSON with comments (and trailing commas) and it's fairly widely supported, namely because VS Code ships with support built in and they use it for all their config files. I've seen libraries for a number of languages.

VS code defaults to complaining about trailing commas though (the warnings can be turned off though (it feels like a hack and they didn't properly document it though (it is an officially sanctioned procedure though))).

> It's because JSON doesn't have comments.

This is a big plus but JSON5 has pretty widespread language library support - probably equal to that of YAML tbh (e.g. Swift has native JSON5 support, I don't know that anyone natively supports YAML). Any reason not to opt for it here?

I believe JSON5 didn't exist when we first wrote pub. If it did, it certainly wasn't widely known.

Obviously, migrating to it now when there are thousands and thousands of packages and dozens of tools all reading pubspecs would be much more trouble than it's worth.

Understandable. I just checked & JSON5 was just 1 year later but even then it would've taken a lot longer to gain sufficient traction to be well supported.
Most protocols defined in RFCs require the use of regular JSON. You don’t have a choice.
Not sure what context you're referring to but we're discussing configuration file formats, not data transports, so I doubt that would be a frequent issue.
I see where you are coming from but YAML anchors are definitely a great and powerful feature that deserves more attention. The other day I was refactoring a broken [1] k8s deployment based on a 3rd-party Helm chart and since I didn't have the time to migrate to a better chart, YAML anchors permitted me to easily reduce YAML duplication, with everything else (Helm, Kustomize, Flux, Kubernetes) completely unaware of anything. Just a standard YAML pattern.

[1] the broken part was due to an ex-coworker that cheated his way out of GitOps and left basically "fake code" committed, and modified by hand (with Lens) the deployment to make it work

Is - not effectively an opening delimiter?

If we want to avoid quoting in particular, then we could use - for strings and anything else for non-strings. But the heuristics suck.