Hacker News new | ask | show | jobs
by jokoon 1472 days ago
Are more strict html parsers/renderers, and aren't they faster?
3 comments

Lenient parsers still benefit from strict input because it lets them avoid lookaround/backtracking.
What do you mean by lookaround/backtracking? You're inside <p>. You encounter another <p>. You can't nest one <p> inside another <p>, so you close current <p> and open new <p>. That's about it. I fail to see where do you need any kind of backtracking.
Well, even in this one example, imagine parser combinators which often mean backtracking the inner <p> so that you can commit to the `openTag('p')` parser. Or your logic may be 'consume all tags that aren't <p>` which is a lookahead.

A better example here is whether you are lenient and accept unescaped html entities like "<" vs "&lt;". If you require it to be escaped "&lt;" or if all entities in your inputs are always escaped, then your text parser never has to backtrack. But if you are lenient, your text parser can do catastrophic levels of backtracking if there is a single "<" somewhere (unless you are careful). Imagine input that starts off "<a small mouse once said". If could be quite a while before your parser knows it's not an anchor open tag.

> Are more strict html parsers/renderers, and aren't they faster?

Are what more strict? You're missing a subject there.

At a guess, you're referencing the differences between Chrome/Firefox rendering times? And are surprised that Chrome is always slower?

In the same completely unscientific stat taking, I found that Chrome was significantly faster at parsing the HTML head element of a document than Firefox, and that difference was enough for Chrome to pull ahead of Firefox in overall rendering times for smaller pages. (Chrome was about 30% of Firefox's time spent in the head.)

However, Firefox was faster at parsing the body, and as I had a larger-than-usual body (50k words is not your average webpage), Firefox was overall faster.

To you and all that have responded: there is no variation in HTML parsing between browsers. All engines are using precisely the same exhaustively-defined algorithm. There is no leniency or strictness. Their performance characteristics may differ outside of parsing, which includes what they do with the result of parsing, but in the parsing itself there should be basically no difference between engines or parsers.