Hacker News new | ask | show | jobs
by general_failure 4053 days ago
> https://github.com/servo/html5ever is the largest parsing library we use in Servo, and there are a bunch of Rust tricks done there. HTML parsing is hard (the spec is insanely complex), and this library does it well with much less code.

This might simply be because of a) Servo has no legacy b) Servo developers are awesome c) Servo is not complete yet

A large part of the complexity of HTML is simply quirks and compatibility. Which Servo does not handle yet.

Don't mistake all this as language wins...

1 comments

> A large part of the complexity of HTML is simply quirks and compatibility.

Actually, the HTML spec already addresses those (that's why it is incredibly complex; unlike HMTL4 and previous, the WHATWG HTML Living Spec -- and possibly the W3C HTML5 spec, though I can never keep straight what that was in the WHATWG spec at the time W3C kept and what it didn't -- contains a complete specification of how compliant user agents should parse anything purporting to be HTML even if it actually isn't valid HTML (IIRC, a compliant parser may throw an error on invalid HTML, but if it is tolerant of errors, the spec specifies how it is to be tolerant, specifically to avoid the pre-HTML5 issue of different browsers parsing the same thing different ways. Modern browser either have converged or are converging -- some might still be lagging -- on that consistent model.)