Hacker News new | ask | show | jobs
by rkangel 2439 days ago
The fact that they're isomorphic to machine partly misses the point. Yaml is immensely more friendly on the human to write (and read). Yaml is used when people need to write declarative instructions to machines, and it does a good job of that. XML is much more of a pain to read and write by hand.
3 comments

I used to think YAML was friendly for humans to read. Then I wrote a parser for it, and discovered all the weird corners, edge cases, etc. I now consider it to be a fairly user-hostile format, which should be avoided in favor of just about everything else (XML, JSON, TOML, text protobuf, etc are all more friendly).

For example, consider this map of regions in YAML:

    regions:
      northamerica: [ca, us, mx]
      scandinavia: [dk, no, se, ax, fi, fo, gl, is, sj]
Spot the error!

Writing a parser is also a bit of a nightmare, because there are a bunch of features which can turn a bit dangerous if you’re not careful—things like cyclic graphs or declaring types of objects. These are complete non-issues for the other formats I listed above—they’re all trees, and it’s very unusual for parsers to let you instantiate unintended types with those formats.

> Stop the error!

I know this is rhetorical, but I've been bitten by this enough times so for those who don't know `no` will translate to a boolean false.

Am I rhe only one who likes single quotes around literal strings?
yaml is not nice, but just quote every string that is a string and many corner cases go away.
Thanks. I was staring at the snippet wondering. I'm not all that familiar with YAML, so I thought perhaps all the values needed to be quoted rather that just written as is.
Curious if you wrote parsers for the other languages you claim are easier. YAML has problem areas, particularly around implicit booleans, but languages without any comment syntax (ie JSON) can not be considered human-friendly. And XML is not even the same sort of language as the rest of these.

I understand thinking YAML makes the wrong tradeoffs, but if you think it's less friendly than XML, then you haven't really worked with XML.

> Curious if you wrote parsers for the other languages you claim are easier.

Yes. YAML was a damn mess compared to the others. You can get a rough estimate of how much by looking at the size of the specs—the XML spec is a fair bit shorter than YAML’s, and if you drop the part about DTDs (which are used less these days) the difference is even bigger. The TOML spec is far, far shorter than either one and the JSON spec makes the TOML spec look big.

I write a lot of parsers. I think it’s fun.

> …but if you think it's less friendly than XML, then you haven't really worked with XML.

If you want to talk about formats, let’s talk about formats. If you make claims that I must be inexperienced because I disagree with you, then it’s just rude.

I have done a few reasonable size projects with heavy XML use. A build system, some work with RPCs, and a web app where I wrote a ton of data for it in XML format, by hand. I also wrote an XML pretty-printer and a YAML pretty-printer. I did a conversion of the XML build system to YAML. I thought it was a bad tradeoff, so I reverted it. Since then I’ve migrated to Bazel. All this experience is a mix of hobby projects and professional.

The bad for XML—it’s more verbose. You have to decide on your own mapping between XML and data. That’s it, as far as I’m concerned.

My personal sense of it is YAML is in a pretty awkward place—it only makes sense for human authoring, not data exchange. My experience with it is that people will naturally want to automatically generate things that they would otherwise have to write by hand. So if you draw a Venn diagram, the YAML use cases are “human authored but not machine generated”.

If we think of using these formats for configuration, then the BIG problem is the sliding scale between pure-data approaches to configuration and using code for configuration. As systems mature and get more complex, the configs often acquire features of programming languages, or parts of the config gets rewritten in code. This is where YAML really suffers. XML is a bit easier, either to extend to add these kind of features or to emit from code.

XML is just a canonical form and proper subset of SGML always requiring quotes around attribute values, all start- and end-element tags explicitly specified, no short reference (Wiki syntaxes), nor other constructs which can be (unambigiously) omitted in SGML as directed by a DTD grammar. As such, XML is a machine format rather than a format intended for editing by humans, and it's odd to complain about XML being unfriendly to edit when that's what SGML is for.
my point is that yaml isn't easier to write at all; it isn't as verbose but it bites you all over the place with unexpected behavior and the fact that validating the schema isn't the same as checking the syntax is super frustrating as you can create a valid yaml file with a typo and it'll be an either invalid or noop configuration. i'd rather have a proper DSL, preferably strongly typed.