Ironically, using an existing parser is what opens you to this vulnerability in the first place. If you hack your own together based on a vague idea of what XML really is, you're very unlikely to "correctly" handle entities, you'll probably just put in enough to handle simple XHTML entities, and that makes you immune to this problem! It's the compliant parsers that are vulnerable to this....
Or, if you use existing parsers in a language like Haskell, you know parsing is supposed to be a pure function. If parsing suddenly requires IO effects, you can be suspicious and try to figure out what is going on.
We're not talking about a malicious XML library here, though. We're talking about a misunderstanding regarding what happens during legitimate parsing of XML.
A) Legitimate libraries don't (unless the IO action is in fact pure)
B) Rogue libraries that do this will not generally work: laziness, optimizations, RTS races can all make the IO action run 0..N times, arbitrarily.
C) It doesn't change the fact that in Haskell, the XML library exposes the weird XML behavior of looking up external entities by being in IO (my original point) -- because of A.
I wrote a libxml2 binding in Haskell (http://hackage.haskell.org/package/libxml-sax). It was an absolute nightmare, in part because handling entities safely requires a lot of hoop-jumping (and I'm not even 100% I caught all the places libxml2 does unsafe stuff).
Okay, parent comment obviously came out wrong and is starting its descent into white hell... ;-) I'm not going to delete it since it would be unfair to the child comments.
XML is for some reason a super-controversial technology that is apparently almost universally hated, and XSLT even more so. I hope I'll not be downvoted even more by asking what's scary about being downstream from a (serious, well-maintained) XML parser?
What's "scary" (not the term I would personally use) is that the libraries typically aren't safe by default against malicious use. Users of the library have to know a lot in order to make them safe. See https://bitbucket.org/tiran/defusedxml for some of the potentially nasty gotchas in XML and XML-related technologies. Quoting from it:
> None of the issues is new. They have been known for a long time. Billion laughs was first reported in 2003. Nevertheless some XML libraries and applications are still vulnerable and even heavy users of XML are surprised by these features. It's hard to say whom to blame for the situation. It's too short sighted to shift all blame on XML parsers and XML libraries for using insecure default settings. After all they properly implement XML specifications. Application developers must not rely that a library is always configured for security and potential harmful data by default.
I think cheald probably means writing code to invoke a parser to parse XML. Presumably if you had written your own parser (generally, not a great idea) the resulting behaviour would not be "scary, twisted"... [at least to the person writing the parser].