Hacker News new | ask | show | jobs
by wmil 4698 days ago
YAML is neat, but library developers have a history of writing unsafe YAML parsers.

There's the famous Rails vulnerability due to YAML. Python needed to add 'yaml.safe_load'.

YAML is a little too rich. It's always one poorly thought out convenience feature away from disaster.

3 comments

Hence TOML was born: https://github.com/mojombo/toml

It has parsers for nearly every language, I wrote one for js: http://npmjs.org/package/tomljs

And JSON was often “parsed” with eval().
That's not really a problem with JSON though is it? Anything you run through eval() is a disaster in the making. Maybe the problem is that people are trying to make data formats too powerful, and too many things seem to be creeping towards Turing completeness that don't need to be.

I think parsers for JSON and Yaml, INI etc should be designed in such a way as to make it impossible to assign anything like an object, class, function, etc. Numbers, strings, and collections of numbers and strings... that's all you should get (though obviously "string" is frought with peril.) Anything more is unnecessarily complex.

It is a problem with JSON in the sense that it's a JavaScript subset, 'in practice' - modulo the Unicode support that goes beyond JavaScript. So it's to be expected that eval() will be used as a convenience by developers, ignoring the security implication that comes will eval() hoisting full JavaScript.

The way to have avoided the issue would have been for JSON to have a grammar that broke eval(). But one could argue the ability to pass JSON into eval() to get JavaScript is one of the reasons JSON became popular to begin with.

Agreed.

YAML is easy to type, even with the whitespace. So is INI. And as verbose as XML is, it's easier, ime, to type than JSON. Of those four, JSON is the hardest to write by hand; certainly it's the one I make most mistakes with, to extent I have a particular technique for writing it out (prefixing the commas). As a result JSON as a config file format is tedious, verbose, and error prone; its sweet spot is a machine interchange format that a human can debug/read if needed.