| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by twoodfin 1486 days ago

This is just so basic a screwup though. The W3C spec for XML has had a formal syntactic description of valid tag names for decades:

https://www.w3.org/TR/2006/REC-xml11-20060816/#sec-common-sy...

Plenty of libraries get this right because it’s so easy. You’d almost have to try—probably by being “clever”—to get it wrong.

2 comments

lucideer 1486 days ago

While I'm not defending the screw-up here - it's bad - it does do it a slight injustice to omit that the issue was not something simplistic around ascii/utf8 parsing but rather failing to reject/escape malformed-UTF8 strings. Unicode handling even in actual programming language implementations is an extremely common and well-documented problem.

link

remus 1486 days ago

I think it's worth remembering that XML parsing is also a big historic source of bugs which suggests to me that while it may look simple and well formed on the surface it's probably a lot harder than it looks.

link

funcDropShadow 1486 days ago

Could you give examples? There were plenty of problems with certain standards layered atop of XML or self-made implementations of XML parsers and unparsers [1], but there is also a well tested set of standard compliant XML libraries that avoid those issues.

[1]: An internationally known consulting firm, that I won't name, had (perhaps has) an internal tool that compiles an Excel description of a service interface into actual XML parsing code that accepts only one hard-coded namespace alias for each given namespace. Over the years I've come across multiple companies with that bug in some service. Everytime I looked into it, the reason was the same internal tool of that consulting firm. And I've met multiple times people who had already discovered that same thing.

link

lucideer 1486 days ago

I have the same question as the sibling commenter: are you sure you mean parsing (i.e. well-formedness) and not handling (i.e. logic to do things with the parsed data: e.g. xxe, namespace separation, etc.

Obviously all software has some bugs and I'm sure XML parsers are no exception but I haven't been personally aware of any high profile ones before this.

For a quick example of a lowish-level XML bug that isn't parsing-related, I reported a bug many years ago in a piece of software whereby attributes without curie prefixes were being placed into the wrong namespace. A weird quirk of the XML spec is that unprefixed tags go into the default namespace but unprefixed attributes go into a "NULL" namespace (or, if I recall correctly, sometimes a specific namespace depending on the tag?). That's not a parser bug though since the parser has parsed the tag, attributes and associated prefix strings (or lack thereof) correctly: it just does something wrong post-parsing.

I feel like that class of bug is very common with XML, but it's more of an application stability concern than a security one (XXE being a notable exception just because it deals with IO)

link

mwcampbell 1486 days ago

IMO the best response to this kind of analysis is to humbly realize that any of us, working under real-world pressures, could make such a screw-up, and contemplate how we'll remain vigilant and mitigate the damage that comes from our inevitable screw-ups.

link