Hacker News new | ask | show | jobs
by lyxsus 2018 days ago
> This breaks people's mental models of how XML works, and correspondingly, breaks people's code that manipulates XML.

Because they usually have incorrect mental model. Blaming namespaces for name ambiguity would be the same as blaming the code "x = a + b" because "a" and "b" could be defined differently.

Namespace prefixes are absolutely irrelevant, they only exists for your convenience.

2 comments

That doesn't seem accurate at all. It would be the case if there was some deterministic abbreviation from URL namespace qualifiers down to namespace prefixes, but there is not; instead, they are template variables, which can be shuffled throughout an XML document, requiring security software to constantly and reliably keep track of the value of the variable at multiple points. People sign URLs and JSON documents all the time with schemes that don't have this goofy property.

There's a similar problem with XML entity references, which have been happily breaking enterprise security for over a decade, because nobody has a good mental model of how entities in XML documents actually behave.

It seems fair at this point to blame the standard.

In hindsight, it probably would have been better to define standard prefixes, let people just sort of register their own for non-standard ones in whatever manner is suitable for their particular top-level document type, and if somebody, somewhere out there did finally manage to stomp on each other, let that particular type of document where that happened deal with it.

While technically suboptimal compared to what currently exists, it would match people's expectations better, and in practice, I can't speak for everybody, but I just don't see a whole lot of documents with hundreds+ namespaces such that collisions are a realistic possibility. And when I do see documents with a lot of namespaces (XMPP, for instance, or XHTML+SVG+some other thing), there's still a top-level type that could keep its own registry just fine. A bit of guidance on naming extensions probably ("don't call it e:, work your name in somehow like with the initial of your company or something") would have 99.9% solved the problem.

Prior to seeing what happened I'd probably still have argued for the current namespaces spec. In principle it doesn't seem that complicated to me. But I'm obviously wrong in practice, because, like I said, I can hardly cite an example of them being used correctly at all.

(Likewise, in hindsight, entities shouldn't have been able to be recursive, and if we were spec'ing out the next generation of XML I'd straght-up remove them except for the ones necessary to XML itself, <, >, and & because UTF covers the major use case of entities now. I'd discard the "terrible, terrible templating language" use case entirely.)

In principle it doesn't seem that complicated to me. But I'm obviously wrong in practice, because, like I said, I can hardly cite an example of them being used correctly at all.

A snarky-but-mostly-true oversimplification: the complexity was necessary because XML was supposed to become a machine-readable interchange format for everything but it ended up not becoming that due to the complexity.

> instead, they are template variables, which can be shuffled throughout an XML document, requiring security software to constantly and reliably keep track of the value of the variable at multiple points

Isn't the issue here that they are mixing this templating with the business logic? They should be fine if the XML parser (or some post-processing) expanded the namespaces and business logic didn't see them at all.

> People sign URLs and JSON documents all the time with schemes that don't have this goofy property.

Similarly, that might be a design issue. They should only sign documents they 100% built and serialized themselves, so the set of tags and namespaces.

> That doesn't seem accurate at all. It would be the case if there was some deterministic abbreviation from URL namespace qualifiers down to namespace prefixes, but there is not;

I'm not sure what you mean by that, tbh. It seems to me that namespace expansion is absolutely straightforward and deterministic. There're scopes, yes, but they're too well-defined (if that's what you mean).

Yes, you are describing the same feature I am with slightly different words. It obviously causes problems. You could describe XML entity expansion in simple terms too, and it would remain one of the major causes of game-over vulnerabilities in enterprise software over the last decade.
Well, yeah, true.

I believe it's mostly implementation and popularisation problems.

The w3c specs surrounding xml/xpath/xslt/rdf and etc are very well designed but it's possible to appreciate them only after you spend ridiculously unreasonable amount of time reading and putting them all together. Otherwise it looks like a stupid pile of complexity with no purpose.

And what upsets me the most is the lack of really good libraries, everything I worked with just sucks so much.

I still have a hope that maybe in 5-15 years things will change.

>Namespace prefixes are absolutely irrelevant, they only exists for your convenience.

This is false. As soon as you need XML canonicalization you very much need those prefices exactly as they were present in the original document.

It doesn't affect data model encoded in document even a tiny bit. Namespace prefixes are irrelevant. If changing these prefixes breaks the program, the program is incorrect.
Again, this is false.

“The C14N-20000119 Canonical XML draft described a method for rewriting namespace prefixes such that two documents having logically equivalent namespace declarations would also have identical namespace prefixes. The goal was to eliminate dependence on the particular namespace prefixes in a document when testing for logical equivalence. However, there now exist a number of contexts in which namespace prefixes can impart information value in an XML document. For example, an XPath expression in an attribute value or element content can reference a namespace prefix. Thus, rewriting the namespace prefixes would damage such a document by changing its meaning (and it cannot be logically equivalent if its meaning has changed).”

https://www.w3.org/TR/xml-c14n/#NoNSPrefixRewriting

> However, there now exist a number of contexts in which namespace prefixes can impart information value in an XML document.

Well, yeah. They've given up to a mass amount of half-ass implementations? So what? I think it's our moral duty to ignore it :)

DTD does not know about namespaces and checks against "prefix:local-name".

E.g. the xhtml dtd will not accept this:

  <h:html xmlns:h="http://www.w3.org/1999/xhtml"/>
If you want to change prefixes, use XML Schema or Relax NG.
DTD is the devil spawn. Devil here being massive security vulnerabilities.
I would say use XML Schema at least. DTD looks alien to XML anyway.