| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by bambax 4493 days ago

> every time I write code to parse some XML

Why would you write code to parse XML?

Use an existing parser to parse.

Use XSLT to modify/transform (including generate JSON/CSV/other).

4 comments

jerf 4493 days ago

Ironically, using an existing parser is what opens you to this vulnerability in the first place. If you hack your own together based on a vague idea of what XML really is, you're very unlikely to "correctly" handle entities, you'll probably just put in enough to handle simple XHTML entities, and that makes you immune to this problem! It's the compliant parsers that are vulnerable to this....

link

Peaker 4493 days ago

Or, if you use existing parsers in a language like Haskell, you know parsing is supposed to be a pure function. If parsing suddenly requires IO effects, you can be suspicious and try to figure out what is going on.

link

gamegoblin 4493 days ago

Even with haskell, someone could sneak in a performUnsafeIO call if you aren't careful. Of course this is trivial to detect with compiler flags etc.

link

Peaker 4493 days ago

We're not talking about a malicious XML library here, though. We're talking about a misunderstanding regarding what happens during legitimate parsing of XML.

link

gamegoblin 4493 days ago

I was just responding to you about pure functions. You can make a Haskell function with a pure type signature that includes a call to unsafePerformIO.

link

Peaker 4493 days ago

You can, but:

A) Legitimate libraries don't (unless the IO action is in fact pure)

B) Rogue libraries that do this will not generally work: laziness, optimizations, RTS races can all make the IO action run 0..N times, arbitrarily.

C) It doesn't change the fact that in Haskell, the XML library exposes the weird XML behavior of looking up external entities by being in IO (my original point) -- because of A.

link

jrockway 4493 days ago

More likely, they'll just write bindings to libxml2.

link

jmillikin 4492 days ago

I wrote a libxml2 binding in Haskell (http://hackage.haskell.org/package/libxml-sax). It was an absolute nightmare, in part because handling entities safely requires a lot of hoop-jumping (and I'm not even 100% I caught all the places libxml2 does unsafe stuff).

link

jrockway 4491 days ago

"absolute nightmare" sounds like you did pretty well for libxml2.

link

bambax 4493 days ago

Okay, parent comment obviously came out wrong and is starting its descent into white hell... ;-) I'm not going to delete it since it would be unfair to the child comments.

XML is for some reason a super-controversial technology that is apparently almost universally hated, and XSLT even more so. I hope I'll not be downvoted even more by asking what's scary about being downstream from a (serious, well-maintained) XML parser?

(And I love XSLT. What can I say.)

link

dalke 4493 days ago

What's "scary" (not the term I would personally use) is that the libraries typically aren't safe by default against malicious use. Users of the library have to know a lot in order to make them safe. See https://bitbucket.org/tiran/defusedxml for some of the potentially nasty gotchas in XML and XML-related technologies. Quoting from it:

> None of the issues is new. They have been known for a long time. Billion laughs was first reported in 2003. Nevertheless some XML libraries and applications are still vulnerable and even heavy users of XML are surprised by these features. It's hard to say whom to blame for the situation. It's too short sighted to shift all blame on XML parsers and XML libraries for using insecure default settings. After all they properly implement XML specifications. Application developers must not rely that a library is always configured for security and potential harmful data by default.

link

arethuza 4493 days ago

I think cheald probably means writing code to invoke a parser to parse XML. Presumably if you had written your own parser (generally, not a great idea) the resulting behaviour would not be "scary, twisted"... [at least to the person writing the parser].

link

cheald 4493 days ago

Yes, indeed. :)

link

kevingadd 4493 days ago

He very clearly said 'write code to parse' not 'write a parser'. The former obviously USES a parser.

link