Hacker News new | ask | show | jobs
by mrighele 639 days ago
I know it is a single example and we should extrapolate much out of it, but in the case of html those who accepted more liberal input (html4/5) won over over those that were more conservative (xhtml).
5 comments

HTML is rather different because it's authored by people. It's typically (though not always!) a good idea to not be too pedantic about accepting user input if you can. XHTML (served with the correct Content-Type) will completely error out if you made a typo and didn't test carefully enough. Useful in dev cycle? Sure. In production? Less so. "The entire page goes tits up because you used <br> instead of <br />" is just not helpful (and also: needlessly pedantic).

But that doesn't really apply to protocols like TCP. Postel's "law" is best understood in the context of 1980, when TCP had been around for a while but without a real standard, everyone was kind of experimenting, and there were tons of little incompatibilities. In this context, it was reasonable and practical advice.

For a lot of other things though: not so much. "Fail fast" is typically the better approach, which will benefit everyone, especially the people implementing the protocols.

This is also why Sendmail became the de-facto standard around the same time by the way: it was bug-compatible with everything else. Later this become a liability (sendmail.cf!), but originally it was a great feature.

RFC 9413 referenced in a parent mentions HTML. It points out that formats meant to be human-authored may benefit more from being liberally accepted.

I also read that XHTML made template authoring hard, as the template itself might not be valid XHTML and/or different template inputs might make output invalid. (I sadly can't find the source of this point right now, but I can't claim credit for it).

I don't recall XHTML being harder to generate from PHP and ASP templates. It's largely down to making sure that all tags in the output are always balanced, which isn't difficult at all.

With PHP specifically there was an issue where the use of shorthand <? syntax for code snippets would conflict with <?xml declaration that would normally be placed at the beginning of the XHTML document - it would see the <? and try to interpret the rest of it as PHP code, which obviously didn't work. The workaround was to disable short tags and always use <?php explicitly

I would almost argue a failing of so many standards is the lack of surrounding tooling. Is this implementation correct? Who knows! Try it against this other version and see if they kind of agree. More specifications need to require test suites.
Am I correct that malformed pages in xhtml would have triggered the browser to output a red XML error and fail to render the page at all?
Yes, but only if you served the XHTML with the proper MIME type of application/xhtml+xml. Nearly everyone served it as text/html, which would lead to the document being intepreted as this weird pseudo XHTML/HTML4 hybrid dialect with all sorts of brower idiosyncrasies [1].

[1] https://www.hixie.ch/advocacy/xhtml

Not really, since in the end HTML5 defined a precise parsing algorithm that AFAIK everyone follows.
HTML5 was born in an era of decent HTML authoring tooling. Very few people write HTML by hand nowadays. This was not true of earlier versions.

Also note that HTML5 codified into liberal acceptance some of the "lazy" manual errors that people made in the early days (many of which were strictly and noisily rejected in XHTML, for example).

The overwhelming complexity of the HTML5 parser [1] is a testament to the 30 years of implementation quirks it's been forced to absorb.

[1]: https://html.spec.whatwg.org/multipage/parsing.html