Hacker News new | ask | show | jobs
by tptacek 2015 days ago
That doesn't seem accurate at all. It would be the case if there was some deterministic abbreviation from URL namespace qualifiers down to namespace prefixes, but there is not; instead, they are template variables, which can be shuffled throughout an XML document, requiring security software to constantly and reliably keep track of the value of the variable at multiple points. People sign URLs and JSON documents all the time with schemes that don't have this goofy property.

There's a similar problem with XML entity references, which have been happily breaking enterprise security for over a decade, because nobody has a good mental model of how entities in XML documents actually behave.

It seems fair at this point to blame the standard.

3 comments

In hindsight, it probably would have been better to define standard prefixes, let people just sort of register their own for non-standard ones in whatever manner is suitable for their particular top-level document type, and if somebody, somewhere out there did finally manage to stomp on each other, let that particular type of document where that happened deal with it.

While technically suboptimal compared to what currently exists, it would match people's expectations better, and in practice, I can't speak for everybody, but I just don't see a whole lot of documents with hundreds+ namespaces such that collisions are a realistic possibility. And when I do see documents with a lot of namespaces (XMPP, for instance, or XHTML+SVG+some other thing), there's still a top-level type that could keep its own registry just fine. A bit of guidance on naming extensions probably ("don't call it e:, work your name in somehow like with the initial of your company or something") would have 99.9% solved the problem.

Prior to seeing what happened I'd probably still have argued for the current namespaces spec. In principle it doesn't seem that complicated to me. But I'm obviously wrong in practice, because, like I said, I can hardly cite an example of them being used correctly at all.

(Likewise, in hindsight, entities shouldn't have been able to be recursive, and if we were spec'ing out the next generation of XML I'd straght-up remove them except for the ones necessary to XML itself, <, >, and & because UTF covers the major use case of entities now. I'd discard the "terrible, terrible templating language" use case entirely.)

In principle it doesn't seem that complicated to me. But I'm obviously wrong in practice, because, like I said, I can hardly cite an example of them being used correctly at all.

A snarky-but-mostly-true oversimplification: the complexity was necessary because XML was supposed to become a machine-readable interchange format for everything but it ended up not becoming that due to the complexity.

> instead, they are template variables, which can be shuffled throughout an XML document, requiring security software to constantly and reliably keep track of the value of the variable at multiple points

Isn't the issue here that they are mixing this templating with the business logic? They should be fine if the XML parser (or some post-processing) expanded the namespaces and business logic didn't see them at all.

> People sign URLs and JSON documents all the time with schemes that don't have this goofy property.

Similarly, that might be a design issue. They should only sign documents they 100% built and serialized themselves, so the set of tags and namespaces.

> That doesn't seem accurate at all. It would be the case if there was some deterministic abbreviation from URL namespace qualifiers down to namespace prefixes, but there is not;

I'm not sure what you mean by that, tbh. It seems to me that namespace expansion is absolutely straightforward and deterministic. There're scopes, yes, but they're too well-defined (if that's what you mean).

Yes, you are describing the same feature I am with slightly different words. It obviously causes problems. You could describe XML entity expansion in simple terms too, and it would remain one of the major causes of game-over vulnerabilities in enterprise software over the last decade.
Well, yeah, true.

I believe it's mostly implementation and popularisation problems.

The w3c specs surrounding xml/xpath/xslt/rdf and etc are very well designed but it's possible to appreciate them only after you spend ridiculously unreasonable amount of time reading and putting them all together. Otherwise it looks like a stupid pile of complexity with no purpose.

And what upsets me the most is the lack of really good libraries, everything I worked with just sucks so much.

I still have a hope that maybe in 5-15 years things will change.