| HTML the markup language was clearly intended as an SGML vocabulary - TBL himself said as much [1] and HTML also reused element names from the SGML spec/handbook as example/folklore vocabulary such as for paragraphs and headings. What browsers made out of it isn't the matter here, but even if it were, the "practical, real-world HTML out there" argument is mostly used to pull up the ladder by an ad company/browser cartel made worse day-in day-out through an atrocious and absurdly voluminous HTML spec (and by CSS, of course). Even though Ian Hickson, of WHATWG, wanted to capture HTML as it was understood by browsers, he couldn't help but added additional elements of his own - such as for marking up ads as "aside" lol plus the alien sectioning elements concept that gave rise to the flawed "outline algorithm" and misuse of heading elements (and earlier failure to understand SGML's RANK feature), a problem that was only fixed last year [2] by an incompatible change to HTML invalidating documents using hgroup as originally advised. In practice, very few changes to the HTML syntax brought HTML outside SGML - for the most part, ad-hoc and basically unnecessary commenting rules for the script and style elements to keep legacy browsers from rendering JavaScript and CSS, resp., when those where introduced. [1]: http://info.cern.ch/hypertext/WWW/MarkUp/MarkUp.html [2]: https://github.com/w3c/htmlwg/issues/22 |
You seem to always start with the assumption that SGML was (and perhaps is) an end goal. I deny this.
HTML was designed as an SGML vocabulary, but, where it mattered, never implemented as an SGML vocabulary. If Tim Berners-Lee ever even expected it to be treated as SGML very much, I suspect he hadn’t thought things through well enough (though that could also just be hindsight bias on my part).
There has never been any particular virtue in HTML being an SGML vocabulary. No one that mattered (which mainly means browsers) cared about SGML, then or now, and no web developers or end users care about SGML, so being SGML is just needless complication and potential for confusion (due to that implying different behaviour from reality). SGML is a hideous, complex beast that no one wants to work with, and which almost everyone that has heard of it is glad is dead.
Yes, SGML had some nice ideas. Yes, we keep on reinventing parts of it. Yes, a variant of Greenspun’s tenth rule applies. But SGML was just too flexible/generic, large and ugly. It doesn’t actually solve things. And the current HTML parser is the best thing since sliced bread and my favourite popular file type spec by a large margin despite its size, because it’s clear, unambiguous, and implementable.