|
|
|
|
|
by masklinn
2015 days ago
|
|
> I expect that a similar problem will be found in many other libraries, if the XML was publicized. XML namespaces made a critical... "mistake" is probably too strong, but "design choice that deviated too far from people's mental model" is about right... that has prevented them from being anywhere near as useful or safe as they could be. In an XML document using XML namespaces, "ns1:tagname" may not equal "ns1:tagname", and "ns1:tagname" can be equal to "ns2:tagname". This breaks people's mental models of how XML works, and correspondingly, breaks people's code that manipulates XML. That right there is why I like Clark's notation (despite its unholy verbosity), which I learned of because that's how ElementTree manipulates namespaces: in Clark's notation, the document is conceptually <{https://sample.com/1}tag>
<{https://blah.org/1}tag>
<{https://blah.org/2}tag>
<{https://anewsite.com/xmlns}tag />
</{https://blah.org/2}tag>
</{https://blah.org/1}tag>
</{https://sample.com/1}tag>
Which is unambiguous. But as you note adds challenges for round-trip equality (in fact ElementTree doesn't maintain that, it simply discards the namespace prefixes on parsing, which I have seen outright break supposedly XML parsers which were really hard-coded for specific namespace prefixes).lxml does round-trip prefixes (though it still doesn't round-trip documents) by including a namespace map on each element. |
|
And if you try to combine namespaces with DTDs (which is just an explosive mix to start with, and I think is just recommended to never do) you get other problems, because you're no longer allowed to add arbitrary namespace declarations in the middle, so anything that round-trips prefixes but might ever add redundant declarations of them won't reliably produce something that DTD-validates, and if you're transforming into a DTD from something that might have used other namespaces, you have to make sure to remove all the extra declarations, and…
Note that most of this is still “well-defined”, it's just awkwardly hairy. This is not to be taken as an excuse to implement the standard badly or incorrectly if you're going to handle it at all.