Hacker News new | ask | show | jobs
by jerf 4446 days ago
Ironically, using an existing parser is what opens you to this vulnerability in the first place. If you hack your own together based on a vague idea of what XML really is, you're very unlikely to "correctly" handle entities, you'll probably just put in enough to handle simple XHTML entities, and that makes you immune to this problem! It's the compliant parsers that are vulnerable to this....
1 comments

Or, if you use existing parsers in a language like Haskell, you know parsing is supposed to be a pure function. If parsing suddenly requires IO effects, you can be suspicious and try to figure out what is going on.
Even with haskell, someone could sneak in a performUnsafeIO call if you aren't careful. Of course this is trivial to detect with compiler flags etc.
We're not talking about a malicious XML library here, though. We're talking about a misunderstanding regarding what happens during legitimate parsing of XML.
I was just responding to you about pure functions. You can make a Haskell function with a pure type signature that includes a call to unsafePerformIO.
You can, but:

A) Legitimate libraries don't (unless the IO action is in fact pure)

B) Rogue libraries that do this will not generally work: laziness, optimizations, RTS races can all make the IO action run 0..N times, arbitrarily.

C) It doesn't change the fact that in Haskell, the XML library exposes the weird XML behavior of looking up external entities by being in IO (my original point) -- because of A.

More likely, they'll just write bindings to libxml2.
I wrote a libxml2 binding in Haskell (http://hackage.haskell.org/package/libxml-sax). It was an absolute nightmare, in part because handling entities safely requires a lot of hoop-jumping (and I'm not even 100% I caught all the places libxml2 does unsafe stuff).
"absolute nightmare" sounds like you did pretty well for libxml2.