| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dmsnell 682 days ago

For better or worse XHTML, also known as the XML serialization of HTML, cannot represent all valid HTML documents. HTML and XML are different languages with vastly different rules, and it's fairly moot now to consider replacing them.

Many of the "problems" with HTML are still handled adequately simply by using a spec-compliant parser instead of regular expressions, string functions, or attempting to parse HTML with XML parsers like PHP's `DOMDocument`.

Every major browser engine and every spec-compliant parser interprets any given HTML document in the same prescribed deterministic way. HTML parsers are't "loose" or "forgiving" - they simply have fully-defined behavior in the presence of errors.

This turned out to be a good thing because people tend to prefer being able to read _most_ of a document when _some_ errors are present. The "draconian error handling" made software easier to write, but largely deals with errors by pretending they can't exist.