Hacker News new | ask | show | jobs
by asavinov 2853 days ago
Typing (enforcing constraints) is an important aspect. But even without typing XML has one fundamental flaw. You are not able to (correctly) represent tuples with attributes which are sets. In XML, tuple attributes are properties, for example:

    <book id="1" title="XML"> -- object or object
Now let us assume that I want to have a list of authors as a property:

    <book id="1" title="XML" authors=["Me", "My Friend"]> -- NOT SUPPORTED
Therefore, we use a workaround (a crime actually):

    <book id="1" title="XML"> 
        <author>Me</author>
        <author>My friend</author>
    </book> 
Now a book is a collection where authors are members (IS-IN relationship). This is not what we wanted. Our goal was to have an attribute "authors" which is a collection.
2 comments

Pretty much any time I've seen something like this, relevant inner element of the book is:

  <authors>
    <author>...
    <author>...
  </authors>
Which reminds me of the picture[1] on the cat-v.org section about XML.

[1] http://harmful.cat-v.org/software/xml/xml_ascent.png

The article "XML Sucks" containing that Ward Cunningham image is interesting:

http://harmful.cat-v.org/software/xml/

It includes this classic James Clark quote:

“Any damn fool could produce a better data format than XML” – James Clark 2007-04-06

Which Tim Bray discussed in "Any Damn Fool":

https://www.tbray.org/ongoing/When/200x/2007/04/08/James-Cla...

Tim Bray recommended James Clark's Random Thoughts about "Do we need a new kind of schema language?": "If you’re doing original work around the intersection of messaging and programming, you need to read it and think about it."

http://blog.jclark.com/2007/04/do-we-need-new-kind-of-schema...

James Clark writes:

>Some people propose solving the XML-processing problem by adopting an XML-centric processing model, for which the leading technologies are XQuery and XSLT2. The fundamental problem here is the XQuery/XPath data model. I'm not criticizing the WGs' efforts: they've done about as good a job as could be done given the constraints they were working under. But there is no way it can overcome the constraint that a data model based around XML and XSD is just not very good data model for general-purpose computing. The structures of XML (attributes, elements and text) are those of SGML and these come from the world of markup. Considered as general purpose data structures, they suck pretty badly. There's a fundamental lack of composability. Why do we need both elements and attributes? Why can't attributes contain elements? Why is the type of thing that can occur as the content of an element not the same as the type of thing that can occur as a document? Why do we still have cruft like processing instructions and DTDs? XSD makes a (misguided in my view) attempt to add a OO/programming language veneer on top. But it can't solve the basic problems, and, in my view, this veneer ends up making things worse not better.

>I think there's some real progress being made in the programming language world. In particular I would single out Microsoft's LINQ work. My doubts on this are with its emphasis on static typing. While I think static typing is a invaluable within a single, controlled system, I think for a distributed system the costs in terms of tight coupling often outweigh the benefits. I believe this is less of the case if the typing is structural rather than nominal. But although LINQ (or at least newer versions of C#) have introduced some welcome structural typing features, nominal typing is still thoroughly dominant.

>In the Java world, there's been a depressing lack of innovation at the language level from Sun; outside of Sun, I would single out Scala from EPFL (which can run on a JVM). This adds some nice functional features which are smoothly integrated with Java-ish OO features. XML is fundamentally not OO: XML is all about separating data from processing, whereas OO is all about combining data and processing. Functional programming is a much better fit for XML: the problem is making it usable by the average programmer, for whom the functional programming mindset is very foreign.

More words from James Clark on JSON: "Yay" and "Sigh"

http://blog.jclark.com/2010/11/xml-vs-web_24.html

>If other formats start to supplant XML, and they support these goals better than XML, I will be happy rather than worried.

>From this perspective, my reaction to JSON is a combination of "Yay" and "Sigh".

>It's "Yay", because for important use cases JSON is dramatically better than XML. In particular, JSON shines as a programming language-independent representation of typical programming language data structures. This is an incredibly important use case and it would be hard to overstate how appallingly bad XML is for this. The fundamental problem is the mismatch between programming language data structures and the XML element/attribute data model of elements. This leaves the developer with three choices, all unappetising:

>live with an inconvenient element/attribute representation of the data;

>descend into XML Schema hell in the company of your favourite data binding tool;

>write reams of code to convert the XML into a convenient data structure.

>By contrast with JSON, especially with a dynamic programming language, you can get a reasonable in-memory representation just by calling a library function.

It is again a workaround because <authors> is a member of a collection - it is not a tuple attribute. Which suggests that we cannot enforce this separation and an application has to understand itself which element is an object property and which element is a member of a collection.
You're trying to impose an arbitrary model on XML.

XML grew from markup languages. It's as if we took some text, parsed it, and then store the resulting syntax tree along with the text. Here elements are non-terminals of the grammar and element attributes are additional augmenting properties of those non-terminals.

The text, of course, does not have to human-readable, it can very well be a sequence of anything, for example, of bits, bytes, words, etc. In XML we'd have to represent these with special elements, something like <byte value="00" /> or <int32 value="12345" />, but once we do this, we can use XML to enclose them into additional non-terminals that tell us what these bytes mean and let us use automated tools to manipulate them.

XML can represent objects, but in its own way: we have to first sort of serialize our object into a sequence and once we have this sequence, we can use XML. The model you seem to be talking about an abstract model of abstract objects in memory. Although memory is technically sequential, we normally ignore this and treat objects as nodes in some graph. This is not the domain of XML; it has to be a sequence to begin with. XML has a concept of IDs and references to IDs and thus can represent graphs reasonably well, but it must be a serialized graph.

So XML is basically a language to express the underlying grammatical structure of an arbitrary sequence. That structure is a tree, but it is based on a sequence nonetheless. It's not quite what abstract objects are in programming; but if you think of files, for example, files are sequences and thus are totally the domain of XML.

> It is again a workaround because <authors> is a member of a collection - it is not a tuple attribute.

Its an element not an attribute, but the set of elements and the set of attributes are both collections.

> Which suggests that we cannot enforce this separation and an application has to understand itself which element is an object property and which element is a member of a collection

It's true that that is not part of bare XML and the nature of attributes vs. elements, even though you want to impose it there.

OTOH to the extent that those words correspond to a well-rounded semantic distinction, it is arguably captured in schema languages, and not mere application-level knowledge.

Even more fun: XML attributes undergo "whitespace normalization"!

https://stackoverflow.com/questions/260436/preserving-attrib...