Hacker News new | ask | show | jobs
by singpolyma3 40 days ago
To be fair, HTML5 also has a defined parsing algorithm. It just happens to always work on any input to produce a webpage
3 comments

Yes, this is what you'd want. It doesn't have to be a complicated as the HTML5 algorithm either. That's complicated because it was a harmonization of at least 3 browser's multi-decade heuristics and untold terabytes of existing HTML practice. An algorithm unconcerned with backwards compatibility could much simpler, but still clearly define error behavior much easier to use than "scream and die".

And it's still unambiguous. You can cringe at what some people do, but it would be strictly a taste issue rather than a technical one, as the parse would still be unambiguous. And if you think you can fix taste issues with technical specification, well, you've already lost anyhow.

I think the GP has an issue not with the specification part, but with the part where it's forbidden for clients to render a noncompliant page.
It's not forbidden. They just don't render certain noncompliant pages. Namely the ones with gross syntax errors.

Why are we okay with formats like PDF that have similarly catastrophic error handling?

I mean, we aren’t ok with that for PDF. That’s why PDF renderers have incredibly baroque rules for parsing weirdly or brokenly formatted documents, and why many PDF documents fall back to embedding images or absolute-positioned pixel-like layouts for compatibility purposes.
I mean, the linked page and the comment above say it is:

> It is explicitly forbidden for clients to accept any page that doesn't conform with the specification. This prevents the standardized diabolic rules that one must implement in order to correct a

I don't get this reply. GP didn't say anything about parsing algorithms, they said (correct) things about hard errors on the web.
why for? the reply is about factual historical experience with webpage hard errors.

Would you like to have a law that forbids you, under penalty of fine, to read any book you buy or borrow that is lacking or has damaged pages?

I thought they were just bolstering the refutation of TFA's assertion that XHTML is strictly better because of its parsing algorithm.