Hacker News new | ask | show | jobs
by bhaak 3504 days ago
I know that, a long time ago I wrote an HTML parser that tried to make the most sense out of any HTML you threw at it. At one point, it was used to parse most of the Chinese websites there were at the time to find neologisms.

So it was pretty robust but yeah, somewhere you should draw the line.

I think, as long as it doesn't compromise the design of your program (for example, parsing rfc822 dates with localized weekdays) it's fine to be a bit lenient in what you accept.

Anything that goes beyond, needs a very good reason.