Hacker News new | ask | show | jobs
by WorldMaker 1428 days ago
The ill-fated XHTML 2.0 was where the academics with actual interest in the semantic meanings of tags got too busy trying to cardinalize their semantic meanings. My understanding was that HTML5 "imported" some of the tag names but never had an interest in the intended semantic meanings as that was part of the schism that killed XHTML 2.0 and was something HTML5 wanted to avoid entirely for pragmatic reasons.

Under that understanding I think you can probably still find interesting semantic versions of these tags in the XHTML 2.0 mailing lists and schism discussions. They aren't relevant to HTML's present, but might be interesting for someone truly curious about the path not taken in semantic HTML (the path unlikely at this point to ever be taken).

2 comments

I was curious so I compared the list of element tags between HTML 4.0 and XHTML 2.0, excluding the XForms module. Excluding XForms tags from XHTML 2.0, the former has 91 tags, reduced to 67 in the latter.

XHTML 2.0 removed 42 tags: acronym, applet, area, b, base, basefont, bdo, big, button, center, dir, fieldset, font, form, frame, frameset, h1, h2, h3, h4, h5, h6, hr, i, iframe, input, isindex, label, legend, map, menu, noframes, noscript, optgroup, option, s, select, small, strike, textarea, tt, u.

XHTML 2.0 added 18 tags: access, action, addEventListener, blockcode, di, dispatchEvent, heading, l, legacyheadings, listener, preventDefault, removeEventListener, ruby, section, separator, standby, stopPropagation, summary,

AFAICT, XHTML 2.0 reorganized tags into modules, yes, but didn't actually try to expand the set of semantic tags, except for XForms--the XForms module looks really complex. And those module groupings were more concerned with functionality, not content semantics, per se.

FWIW, here are the XForms 2.0 tags: action, delete, dispatch, group, input, insert, load, message, model, output, range, rebuild, recalculate, refresh, repeat, reset, revalidate, secret, select1, select, send, setfocus, setindex, setvalue, submit, switch, textarea, trigger, upload

By contrast, HTML5 looks to have added more semantic tags, and more incoherently. HTML5 has 111 elements (excluding math and svg).

HTML5 removed 14 tags: acronym, applet, basefont, big, center, dir, font, frame, frameset, isindex, noframes, param, strike, tt

HTML5 added 34 tags: article, aside, audio, bdi, canvas, data, datalist, details, dialog, embed, figcaption, figure, footer, header, hgroup, main, mark, meter, nav, output, picture, progress, rp, rt, ruby, section, slot, source, summary, template, time, track, video, wbr

Source: https://www.w3.org/TR/html4/index/elements.html, https://www.w3.org/TR/xhtml2/elements.html, https://html.spec.whatwg.org/multipage/indices.html#elements...

As far as I recall, that "final" draft of the XHTML 2.0 that W3 posted is "post-schism" just to get something out to compete with the growing momentum of HTML5 and kick the semantic can down the road again to XHTML 3.0 (after most of the damage of the schism was already done). I recall early XHTML 2 drafts had at least article, aside, section, hgroup, and others. I don't know where you would track down such drafts other than combing ancient mailing list archives.
Section and article makes sense as "parts of a book". However, unlike HTML, article is always hierarchy bellow section, it is actually bellow paragraphs. This schema is common in legal texts in many languages, I don't know if this is the case in EUA.

The hgroup elements also seems to be related to this.

My reasoning has always been that an article is a separable entity, which can do without the given context. (E.g., you can share it, or you can present multiple of them in varying order.) So a document may have sections, which may include articles, which in turn include sections, like the table of contents, a section of images, etc. So there's no distinctive hierarchy to them, as each may contain the other. (Mind that this is somewhat different from the use of articles in legal documents, which are integral elements of that document and lose meaning, if provided out of context.)

While any such interpretation is somewhat funny in the context of the parent comment, it may still turn out useful. E.g., if we were to scrape any content from an existing site in order to reintegrate it for a relaunch or a similar purpose.

And, as we're at it, a div is really just a technical means for applying something to a group of elements (e.g., in it's a original use, an attribute for centered text presentation), think of it as blocks in programming. Nothing semantic to see here, keep calm and carry on…

BTW, thanks for mentioning the hgroup, which is often overlooked, but really makes sense, when combining headings and subheadings, which are to be understood as a single item (like the head of an article, yes, an article in the common sense).

The actual specification of article and section elements in HTML is pretty much what you said.

My issue with them is not with their roles, but with their names. And, from the article and from OP, it seems I'm not the only one. I think "region" as it is used by WAI-ARIA would be a better name. Also something like "contentinfo" instead of "footer". And "complementary" instead of "aside"...

> E.g., if we were to scrape any content from an existing site in order to reintegrate it for a relaunch or a similar purpose.

The spec call this "outline".

Related to divs. I find ironic that making pages with tables were frowned upon 20 years ago, yet it is hot again now, but we are calling them "grids".

They say, the lack of usage of hgroups is due the lack of support by screen readers. Another common use case is <h1>Chapter 1</h1><h2>Foobar</h2>.

Regarding the table irony, see also the common use of table, table-row, and table-cell display styles for anything but actual tables. ("If I'm using divs, it's fine!") :-)

(Tables should even be more accessible, since there is <th>, both in <thead> and with `scope="row"` for table rows.)

Something, I've been guilty of (sometimes) for emulating hgroup: <h1>Heading<br /><small>Subhead</small></h1>.

In both cases article means something like "an atom of content". In legalese each statement is a separate article, in other context an entire book can be an article.

https://www.etymonline.com/word/article