Hacker News new | ask | show | jobs
by lvillani 4366 days ago
> There isn't an obvious way to parse broken HTML, and every HTML parser does it differently

At least with HTML 5 we have both a spec (http://www.whatwg.org/specs/web-apps/current-work/multipage/) and a library to parse it (https://github.com/google/gumbo-parser)