| > It's a tradeoff, of course. > Reasonable people can disagree about whether the tradeoff is worth it. > But that's my point: reasonable people can disagree. Sure, we disagree.
I understand that you think there is a tradeoff, like in many design decisions.
However, you did not tell me what benefits you expect from having the syntactical distinction. > It syntactically distinguishes information that applies to the tag from information to which the tag applies. That statement alone does not say why it is a good thing to distinguish syntactically both kind of informations. This looks interesting, of course, to be able to have a distinction. The fact that there is a distinction on the semantical level does not mean it should be there syntactically, though. The syntax for attributes is unfortunately flawed, which is why ... > Any language feature can be abused. The proper response IMO is to stop abusing the feature, not to eliminate it. ... is taking the problem completely backwards. No language feature were abused. In fact, attributes were acting against a natural organization of information.
And that is why, as a workaround, it was needed to express meta-data with tags. I don't expect you to agree with this, so consider a classical example of markup usage. You want to represent a document, with reviewers, publication dates and authors (those are meta-informations, right?) as well as content (the actual text being stored).
However, there exists meta-informations about people (name, title) and dates (calendar). Where do we store structured meta-informations about attributes? Unfortunately, attributes do not allow to express structured information, and they cannot have meta-informations attached to them.
Here is what you obtain: <root>
<document author="id0"
published_in="ACM"
publication_date="2010/03/02"
publication_volume=345
link="doi://1020301.202.301.1023"
reviewers_id="id1;id2;id3;id4">
... content ...
</document>
<peoples>
<people id="id0"><name>John Doe</name>...</people>
<people id="id1">...</people>
<people id="id2">...</people>
<people id="id3">...</people>
<people id="id4">...</people>
</peoples>
<root>
Notice how informations about publication are scattered into different attributes instead of being a single attributes with sub-components? (has the document been published in March or February? In which timezeone?)Authors are only indirectly referenced through identifiers, because the real structure cannot be easily expressed in attributes only.
Also, a list of reviewers is actually a string with semicolon-separated identifiers. An so, peoples are not just meta-informations, but tags with nested children and you must have a "root" element around your document, and a special list of "peoples".
Just to be clear, having identifiers is not bad and could be a good way to model relationships.
The problem is that you do not have a choice anymore of using different level of meta-attributes. Notice how the link to a "DOI" identifier is itself encoded in a string (this is a custom format just for this example), instead of using a more useful nested structure: (link (protocol doi)
(path (digits 1020301 202 301 1023)))
Each time you use a string to encode structured information in an attribute with a custom mini-language, you are asking for trouble.
Imagine how each of those strings now need to have a dedicated parser because you need to take care of escaping "special" characters.You might say that this is unfortunate that attributes are "flat", and that maybe a kind of hierarchical way of expressing attributes would be more preferable.
And then, you would have nested-attributes as well as nested-elements. Why not merge them into the same syntactical structure? If you consider that identifiers are not necessary, or if your format allows for sharing common sub-expressions (like #1=(author), #1#), then you could go with that kind of data-format: (document (author (name "John Doe") (job-title "Professor") (institution "MIT"))
(anchor
(link (protocol doi)
(path (digits 1020301 202 301 1023)))
(target blank-page))
(reviewers (reviewer (name "..."))
(reviewer (name "..."))
...)
(encoding (utf 8))
(sections
...))
Then, you have multiple layers of "meta"-informations, instead of just 2: "data" and "dumb meta-data".
I agree we disagree, but I do not think both approaches are equal.
You talk about tradeoffs, but I really do not see anything useful in having attributes, whereby I can see the inconvenience they bring when trying to structure information in a meaningful way. |
That's because you chose examples that show attributes at their worst.
Suppose that instead of a document with a single author you instead had a document to which many people contributed, and you wanted to mark it up to show who wrote which section. Using attributes (and identifiers) you would have, e.g.
This example also highlights why it is sometimes NECESSARY to use identifiers in order to produce the semantically correct structure. Suppose you put all the author information in-line as you suggest. The result would look something like this: Were the first and third parts written by the same person, or by two different people whose names both happen to be Bob? If you put everything in-line there is no way to express that two pieces of structure are intended to be EQ to each other.