Hacker News new | ask | show | jobs
by NathanKP 4698 days ago
This hack, while nice, is still just a work around. I highly recommend that if you can, in as many places as possible use YAML instead of JSON.

JSON works great for on the fly communication with frontends that are running JavaScript, or for communication between JavaScript processes like Node.js servers. But for configuration files and other things that need comments YAML is many times better, both for it's clean, Markdown reminiscent structure, and its native comment support.

Node.js has a great module called js-yaml (https://github.com/nodeca/js-yaml) which automatically registers handlers for .yml and .yaml files, allowing you to require them in your Node.js code just like you can with JSON files.

It also comes with a YAML parser for the browser side of things, so if you want you could even communicate YAML directly from the server to the client side, although frankly I don't see much advantage to sending YAML over the wire instead of JSON. (And as others have mentioned below untrusted YAML sources could insert malicious objects in YAML, so I wouldn't recommend this technique.)

You can even use YAML for your package.json in a Node program: (https://npmjs.org/package/npm-yaml)

7 comments

YAML is neat, but library developers have a history of writing unsafe YAML parsers.

There's the famous Rails vulnerability due to YAML. Python needed to add 'yaml.safe_load'.

YAML is a little too rich. It's always one poorly thought out convenience feature away from disaster.

Hence TOML was born: https://github.com/mojombo/toml

It has parsers for nearly every language, I wrote one for js: http://npmjs.org/package/tomljs

And JSON was often “parsed” with eval().
That's not really a problem with JSON though is it? Anything you run through eval() is a disaster in the making. Maybe the problem is that people are trying to make data formats too powerful, and too many things seem to be creeping towards Turing completeness that don't need to be.

I think parsers for JSON and Yaml, INI etc should be designed in such a way as to make it impossible to assign anything like an object, class, function, etc. Numbers, strings, and collections of numbers and strings... that's all you should get (though obviously "string" is frought with peril.) Anything more is unnecessarily complex.

It is a problem with JSON in the sense that it's a JavaScript subset, 'in practice' - modulo the Unicode support that goes beyond JavaScript. So it's to be expected that eval() will be used as a convenience by developers, ignoring the security implication that comes will eval() hoisting full JavaScript.

The way to have avoided the issue would have been for JSON to have a grammar that broke eval(). But one could argue the ability to pass JSON into eval() to get JavaScript is one of the reasons JSON became popular to begin with.

Agreed.

YAML is easy to type, even with the whitespace. So is INI. And as verbose as XML is, it's easier, ime, to type than JSON. Of those four, JSON is the hardest to write by hand; certainly it's the one I make most mistakes with, to extent I have a particular technique for writing it out (prefixing the commas). As a result JSON as a config file format is tedious, verbose, and error prone; its sweet spot is a machine interchange format that a human can debug/read if needed.

This hack, while nice, is still just a work around. I highly recommend that if you can, in as many places as possible use YAML instead of JSON.

Rails RCE, sup

I've actually never developed anything serious in Rails. I just don't like the framework, and the performance of Rails leaves a lot to be desired in my opinion. I'm a 100% Node.js convert these days.

But I do like the Rails convention of using YAML format and have adopted that in my own code as much as possible.

I think he's referring to the rails YAML exploit [0] because you can use yaml to create objects, like this:

    --- !ruby/hash:ActionDispatch::Routing::RouteSet::NamedRouteCollection
     'foo; eval(eval(puts '=== hello there'.inspect);': !ruby/object:OpenStruct
       table:
        :defaults: {}
Allowing people to run arbitrary code on rails servers.

[0] http://rubysource.com/anatomy-of-an-exploit-an-in-depth-look...

Yeah, I had read about that. One more reason not to send YAML over the wire. YAML makes great sense for your internal configuration files and internal data structures where you need comments and readability. YAML is perfectly safe here because chances are you aren't going to be exploiting yourself by putting malicious objects in your YAML.

But for over the wire communication, JSON makes more sense than YAML, not only because parsing unsafe YAML from an untrusted client could cause exploits like you mentioned, but also because YAML is dependent on indentation and line breaks, and therefore makes communication with the client side much more awkward than just sending JSON to the client or receiving JSON from it.

I believe the parent was referring the many recent YAML based vulnerabilities found in Rails (and elsewhere). He is basically saying, "You can use YAML -- if you don't care about injection vulnerabilities."
In my experience, YAML is better for configuration files and human edited files. JSON is better for data and communication between computers. The features that make YAML easier to write (comments, more flexible format, less quoting) make it more complex and slower to parse.

Also, many of the security holes in YAML come from its use as a serialization format which can represent native classes. I wish the YAML parsers had more explicit support for simple data schemas which would reduce the security risk and be sufficient for most configuration files.

Ironically, YAML has object serialization features out the wazoo and JSON for that purpose is relatively more spartan. I will never understand why that happened the way around it did. YAML should have been left at human readable with none of the object serialization stuff thrown in.
While on the topic of encodings (I'm a huge encodings geek), let me plug a new one we recently discovered called Space (https://github.com/nudgepad/space). It is dead simple and has the nice feature that it is extraordinarily easy for both humans and machines to read and write.
It is definitely very minimalist. Personally I have issues parsing it visually though, because the indentation of only one space makes it hard to differentiate inner data structures particularly on a large screen with small fonts. Additionally the lack of a division character other than space between the key and the value makes reading each key value pair much harder because the key and value tend to run together visually.
Thanks for the feedback. I totally agree with you.

Adding easy syntax highlighting is my next step to address this problem.

YAML is excellent for resource files, i.e. human editing complex data.

For -configuration- you want a simpler format; INI is worth considering, as is http://p3rl.org/JSONY which is ingy's implementation of a vision we thrashed out for a more sysadmin-friendly config format.

I agree it is a cute hack, but it is also kind of horrifying. You are depending on an undocumented behavior that happens to be shared across the ecosystem. Now what happens if that file hits a parser which takes the first instance, or a functional one that errors out when it sees multiple assignments?

+1 re YAML

I used some YAML to configure internal systems, and the impression of my teammates was that it was a bit fragile. Maybe we were using it wrong?
It is dependent on a specific indentation format which is one thing I dislike about it. But if you configure your vim or whatever editor you use to properly indent YAML files you should have few issues with fragility.

Even with indentation problems, the time saved in not typing curly brackets, extra quotation marks, and commas, and the time saved in not having to visually parse these when reading YAML more than makes up for the occasional data structure bug caused by bad indentation.