Hacker News new | ask | show | jobs
by dissenter 6512 days ago
XML is an embarrassment. It solves no problem: not even the problem of agreeing on how to represent data. The only thing it does is give programmers something recognizable to fiddle with, however irrelevant to the problem it may be.

Most religious wars in computer science hinge on matters of taste. If you prefer emacs to vi, maybe that's just your style. If you prefer PHP to Ruby, there may be several good reasons why.

There is no such ambiguity in the case of XML. If you prefer XML to anything but XML, you don't know what you're talking about. You should have no say in anything that affects other programmers.

We're in this mess because of the unforeseen popularity of the web. When the web was created, its designers chose a simple and not particularly good markup language. Then the web grew, and instead of everybody recognizing the language as bad and replacing it, we turned a blind eye to its faults and kept it around.

The immense popularity of the web has glossed over all the deficiencies present in markup languages. People can't imagine that anything that built the internet might have something wrong with it. The internet is good, so anything that built the internet must be good as well.

Markup language was ill-conceived. Generalizing it into XML was folly.

How can you possibly take XML seriously? How do you squeeze an entire blog post out of it? Have you never bothered to look at the technology? The author is obviously capable of writing a coherent, well-thought out essay. Did he never stop and look at what he was doing and go, "This is a whole lot of shit!"

5 comments

Maybe I am missing something, but I think thats a little extreme...

If I have some data structures that I am trying to send from my C# windows app to your java unix app, how would you propose we do that?

With XML, we can easily agree and collaborate on a format and both of our languages have builtin libraries to extract the data we need.

Its easy to build and easy to debug.

Yes. The fact that XML parsers are pervasive is a good thing, and an advantage for XML as a technology. But it says absolutely nothing about XML as a metadata format. A standard parser suite for anything would have the same advantages.

Note also that there are other technologies with pervasively available parsers, like JSON, which don't share any of XML's warts.

OK, I'm going to send you a list using XML.

Now how do you propose we do that?

This is entirely the problem. The XML didn't solve anything. We still have to negotiate the terms of the transfer. We've agreed to use XML, true, but we are still at square one.

Of course you have to negotiate the terms of the transfer. You have to do that using any format (custom binary, XML, JSON, s-expressions, whatever). XML just defines a lot of common syntactic stuff which you would have to define anyway in any format you decide on.
I would argue that there's very little value in standardizing that syntactic stuff. Whatever tiny amount of value there might be is probably destroyed by picking a convention as almost universally inappropriate as XML.
<list><listitem>a</listitem><listitem>b</listitem><listitem>c</listitem></list> ?

The xml solved the problem of me having to write a parser from scratch for whatever terms of transfer you come up with.

And I don't really see the downside for most applications.

So the fact that 96% of the characters you are sending are markup is not a downside?

S-expressions solved this problem a long time ago. (a b c) is only 57% markup. I don't think you can get much more succinct than that and still express the idea of a list.

XML have redundancy by design. It's a deliberate trade-off to make it easier to read and write by hand, at the cost of size.

If message size is an issue then gzip the data. Or if you have very specific needs for processing speed, look into something like Googles protocol buffers.

You are optimizing at the wrong level if you are concerned about a few extra characters in a human-readable data exchange format.

But is it really easier to read and write? Properly-indented S-expressions are just as readable. Generating XML and then gzip-ing it is a lot more work (and requires a lot more libraries) than generating S-expressions.

Case in point:

  <table>
    <tr>
      <td>a</td>
      <td>b</td>
    </tr>
    <tr>
      <td>c</td>
      <td>d</td>
    </tr>
  </table>
vs:

  (table
    (tr
      (td a)
      (td b))
    (tr
      (td c)
      (td d)))
Perhaps the real problem is that too many people use terrible text editors. Paren-matching and auto-indentation makes writing S-expressions orders of magnitude easier, and at least a constant factor easier than writing XML.
" I don't think you can get much more succinct than that and still express the idea of a list."

What matters is: a list of what. Telling me (5, 7,11) means nothing.

    <sizes dress='5'  pants='7' shoes='11' /> 
is what's of value.

Would an sexp be terser? Sure. But not enough to matter.

That does not seem to be to be typical XML, and if it is, it's really being stretched to do something it's not intended to do, IMO. XML's strength is in representing tree-based structures, but that appears to be an attempt to represent an associative structure. With Sexps, this is just as easy:

  (sizes (dress . 5) (pants . 7) (shoes . 11))
But in doing that, the structure really looks off, even though it's almost exactly mirroring the XML. I think this is a clue that the XML is a bit of a stretch. Much better (in Lisp code) is:

  (let ((sizes '((dress . 5) (pants . 7) (shoes . 11))))
     ; do something with sizes
     ...)
But I guess the real question is what this is trying to represent. If it's the sizes of various people, then the Sexps are quite simple:

  (sizes
    (sally (dress . 5) (pants . 7) (shoes . 11))
    (suzy ...)
    (alice ...)
    ...)
Of course, this could be expressed in XML, but how to do it best?

  <sizes name='sally' dress='5' pants='7' shoes='11' />
  <sizes name='suzy' ... />
  <sizes name='alice' ... />
But then perhaps all these sizes should be wrapped in another tag:

  <sizes-list>
    <sizes name='sally' dress='5' pants='7' shoes='11' />
    <sizes name='suzy' ... />
    <sizes name='alice' ... />
    ...
  </sizes-list>
But now we're getting away from the structure we defined using S-expressions, and besides, name='...' seems to be distinctly different information from the sizes themselves, so something else is strange. Perhaps

  <sizes>
    <sally dress='5' pants='7' shoes='11' />
    <suzy ... />
    <alice ... />
    ...
  </sizes>
Well, that looks nice, and closer to what we are trying to represent, but of course it's impossible to validate (at least from what I know of XML), since the person names are not part of our schema. We'll have to do something like:

  <sizes>
    <person name='sally'>
      <dress size='5' />
      <pants size='7' />
      <shoes size='11' />
    </person>
    <person name='suzy'>
      ...
    </person>
    <person name='alice'>
      ...
    </person>
    ...
  </sizes>
But now we have the problem of having to place every type of clothing in our schema. We better change it some more

  <sizes>
    <person name='sally'>
      <clothing type='dress' size='5' />
      <clothing type='pants' size='7' />
      <clothing type='shoes' size='11' />
    </person>
    <person name='suzy'>
      ...
    </person>
    <person name='alice'>
      ...
    </person>
    ...
  </sizes>
Great! Now we have something that matches our desired structure and is easy to validate. Of course, it's much more verbose, but that made it easier to read and write, right?

Part of the problem with XML is that it causes these huge debates about how to structure and name the data. Another problem is that attributes don't nest nicely; that was the main problem in this instance. In other words, XML can be used nicely to represent a tree structure and reasonably well for lists or simple associative structures. But as soon as those associative elements need to map to something more complicated, you start having issues with how best to structure everything.

With S-expressions easily able to express assoc-lists while also being trivially nestable, these issues don't come up.

It is a downside, but most of the time I don't care.

If I am sending some big list many times and speed really matters, then I care, but usually I am not. Size and speed are cheap nowadays.

It's not just about the amount of markup, though, it's about the unnecessary complexity of the markup. The software that generates and parses S-expressions is much simpler than that which generates and parses XML. Of course, in Lisp, it's just

  (let ((list (read data)))
     ...)
But even in Python, you could easily hack together (not recommended) something like

  list = ", ".join(data.split())
  ...
Of course, that's not robust, but the library which is robust is much simpler than the one that requires the use of a C sax parser just to be usably fast.

If the argument is that XML is more human-readable, that is implying that it's being human-modified, and then XML creates more work since it's so verbose. If the verbosity is not an issue because it's auto-generated, that implies that it's not being read/modified by humans, and the whole point of using XML in the first place is lost. I just can't see any problem that XML solves that S-expressions didn't already solve in a simpler way.

He said: With XML, we can easily agree and collaborate on a format and both of our languages have builtin libraries to extract the data we need.

You replied: OK, I'm going to send you a list using XML.

The only thing you have proved here is that your blind hatred for XML makes you unable to read and parse what is posted.

When arguing against the evils of a standardised format which there are proper parsers for everywhere, failing to parse stuff yourself is probably not on the list of things you want to do.

You missed the point. After you've agreed on XML, you still have to agree on how to represent a list. You can use existing libraries to parse the XML, but you still have to write a parser to transform the resulting DOM into a list.

Plus, given that the availability of good XML parsers is one of the primary advantages claimed for XML, can anyone name some XML parsers that aren't ridiculously slow? Maybe I haven't looked hard enough, but I always seem to find that my own code to parse ad-hoc formats goes 10x faster than, e.g., expat parsing XML.

You missed the point. After you've agreed on XML, you still have to agree on how to represent a list.

No I didn't. Because that's exactly what the guy in the OP said. Agree on a XML schema for the data. After you've done that, writing some simple Xpath to get your data is done in minutes.

The only times my XML code is 100% DOM is when I need to make things from scratch or do XML data manipulation.

I think that my favorite quip on the topic is:

“XML is a giant step in no direction at all.” (Erik Naggum)

He also said this (I don't think he's much of an XML fan):

"Structure is _nothing_ if it is all you got. Skeletons _spook_ people if they try to walk around on their own. I really wonder why XML does not."

An embarrassment that solves no problem, not even the problem of agreeing on how to represent data?

Surely you jest. In the real world, well, my world anyway, the problem of how to transfer bits of data around in a file format wildly different systems can understand is a major problem indeed, even if it's not a terribly sexy one, and the, admittedly ugly, representation of data in some sort of HTML-inspired fashion may not be anywhere near a solution for the data representation problem, it's most definitely better than not making a stab at solving the problem at all.

I'm confused as well. XML is fantastic for providing an easy way to roll customized data storage and interchange documents.

The article is pointing to technologies like JSON as a "backlash" to XML. I only use JSON when sending PHP objects directly to javascript to manipulate. Why create a data interchange format if you don't need to?

XML is like Java. The language isn't friendly, but the platform has so many man-years invested into it that it may be the best tool for the job.

Maybe my "taste" is to have a solid platform at the expense of some syntactic niceties. Any any case, saying that anyone who disagrees with you is certainly ignorant just makes you come off as foolish.

What do you propose instead of a markup language for the things HTML is used for?
I think by markup language he means SGML or its derivatives.