Hacker News new | ask | show | jobs
by bambax 4446 days ago
> every time I write code to parse some XML

Why would you write code to parse XML?

Use an existing parser to parse.

Use XSLT to modify/transform (including generate JSON/CSV/other).

4 comments

Ironically, using an existing parser is what opens you to this vulnerability in the first place. If you hack your own together based on a vague idea of what XML really is, you're very unlikely to "correctly" handle entities, you'll probably just put in enough to handle simple XHTML entities, and that makes you immune to this problem! It's the compliant parsers that are vulnerable to this....
Or, if you use existing parsers in a language like Haskell, you know parsing is supposed to be a pure function. If parsing suddenly requires IO effects, you can be suspicious and try to figure out what is going on.
Even with haskell, someone could sneak in a performUnsafeIO call if you aren't careful. Of course this is trivial to detect with compiler flags etc.
We're not talking about a malicious XML library here, though. We're talking about a misunderstanding regarding what happens during legitimate parsing of XML.
I was just responding to you about pure functions. You can make a Haskell function with a pure type signature that includes a call to unsafePerformIO.
You can, but:

A) Legitimate libraries don't (unless the IO action is in fact pure)

B) Rogue libraries that do this will not generally work: laziness, optimizations, RTS races can all make the IO action run 0..N times, arbitrarily.

C) It doesn't change the fact that in Haskell, the XML library exposes the weird XML behavior of looking up external entities by being in IO (my original point) -- because of A.

More likely, they'll just write bindings to libxml2.
I wrote a libxml2 binding in Haskell (http://hackage.haskell.org/package/libxml-sax). It was an absolute nightmare, in part because handling entities safely requires a lot of hoop-jumping (and I'm not even 100% I caught all the places libxml2 does unsafe stuff).
"absolute nightmare" sounds like you did pretty well for libxml2.
Okay, parent comment obviously came out wrong and is starting its descent into white hell... ;-) I'm not going to delete it since it would be unfair to the child comments.

XML is for some reason a super-controversial technology that is apparently almost universally hated, and XSLT even more so. I hope I'll not be downvoted even more by asking what's scary about being downstream from a (serious, well-maintained) XML parser?

(And I love XSLT. What can I say.)

What's "scary" (not the term I would personally use) is that the libraries typically aren't safe by default against malicious use. Users of the library have to know a lot in order to make them safe. See https://bitbucket.org/tiran/defusedxml for some of the potentially nasty gotchas in XML and XML-related technologies. Quoting from it:

> None of the issues is new. They have been known for a long time. Billion laughs was first reported in 2003. Nevertheless some XML libraries and applications are still vulnerable and even heavy users of XML are surprised by these features. It's hard to say whom to blame for the situation. It's too short sighted to shift all blame on XML parsers and XML libraries for using insecure default settings. After all they properly implement XML specifications. Application developers must not rely that a library is always configured for security and potential harmful data by default.

I think cheald probably means writing code to invoke a parser to parse XML. Presumably if you had written your own parser (generally, not a great idea) the resulting behaviour would not be "scary, twisted"... [at least to the person writing the parser].
Yes, indeed. :)
He very clearly said 'write code to parse' not 'write a parser'. The former obviously USES a parser.