Hacker News new | ask | show | jobs
by tptacek 4530 days ago
XXE's are awful. You wouldn't think that simply by parsing an XML file --- something so simple people are tempted to do it with regexes --- you'd be invoking machinery that translates the XML language and binds it to, in effect, scripting language features. But that's what you're doing when you use common XML libraries!

For applications on mainstream stacks, if you accept XML inputs (explicitly accept them, that is; as in, invoke the XML parser yourself) and haven't taken the time to make sure you're not expanding entities, the safest bet is to assume that your XML parser has a "let inbound XML run shell commands" feature embedded into it. That's an oversimplification, but maybe not much of one.

This is a great, subtle finding. And Reginaldo handled it like a pro. Let the feeding frenzy for hiring Reginaldo Silva... commence! :)

3 comments

I don't know if you read it, but I sent you an email about this same bug (when I originally found it in Drupal) in 2012. Didn't know FB was vulnerable back then. By the way, I learned a lot from you here on HN. So let me take this opportunity and say thank you very much.
I did! I responded to your first mail, too! :)

When I saw your name, it looked familiar, and I went and looked up your old mail. Great work! Congrats on an awesome finding.

So by default many XML libraries essentially allow remote code execution?

How in the world is that ok? How is that the standard?

which platforms? Really I am curious. Checking our XML in processors and there is nothing there that could lead to execution of what is within the XML.

Are there examples somewhere I can see to understand how this is even possible?

Not just XML, JSON parsers are notoriously vulnerable as well.
Citation needed.

There have been many vulnerabilities in YAML parsers for ruby because they let you encode actual objects / code.

JSON, despite being "Javascript object notation", can't actually encode full code/objects. You only have a few datatypes: (off the top of my head) bools, strings, numbers, arrays, key/value dicts. None of these are dangerous or difficult to parse.

What you might be thinking about is the recent Ruby on Rails vulnerability which was caused by transforming JSON into YAML and then parsing the YAML. It would be more accurate to say the YAML parser was vulnerable.

Your claim that "JSON parsers are notoriously vulnerable" implies that this is a common occurrence as well, not just a single incidence.

I personally don't see it as likely because JSON has pretty much no features compared to xml; the surface area is tiny.

Not exactly remote execution, but the parsing and construction of key/value dictionaries can be and has been exploited [1].

[1] http://arstechnica.com/business/2011/12/huge-portions-of-web...

Agreeing with the other statement: a JSON deserializer should never be executing arbitrary code as part of a feature of the deserializer. YAML, Python pickle, PHP serialization, etc. all allow serialization of arbitrary class instances by default, but JSON only allows simple data types.

So, no clue where you're getting that from.

Examples like these are the reason why I like to avoid XML. Unless your using something that actually takes advantage of the tree structure of xml and needs it's features, it's really overcomplicated overkill that can bite you in the ass.

%95 of the time your just using XML like another JSON/serialization format and you should definitely be using something just as lightweight.