Hacker News new | ask | show | jobs
by bhaak 3504 days ago
Great, after the tag soup of modern browsers are we now also going to see json soup?

Sometimes it's obvious what's wrong with malformed data you receive. A classic would be encoding errors.

But as soon as you start supporting broken components and APIs, you will never be able to unsupport it.

Prime example would be HTML. Granted, in the beginning, it was supposed to be written by humans but that was rather quickly not a major obstacle anymore and even a human can produce valid HTML with the help of a syntax checker.

1 comments

I've written a relatively popular Atom/RSS feed parser for Go [0].

I struggled with this very issue but I ultimately ended up attempting to be robust against out-of-spec feeds. A super strict feed parsing library is less useful than one that can successfully parse certain classes of broken feeds.

It is a fine line to walk -- I won't add a great deal of complexity to support overly broken feeds, but if it is relatively simple to support certain types of common mistakes I'll do it.

[0] https://github.com/mmcdole/gofeed

I'm doing this with WebDAV too. When I come across a bug that's clearly an implementation problem I weigh how prevalent the software is, how likely they will be able to fix it and if possible I add a user-agent specific workaround so new clients can't rely on the same bug with my server.
But then we add the IE-nightmare of using an accepted user-agent in a new product to workaround cases like this
That nightmare had to do with misbehaving servers. IE had to advertise as Mozilla so servers would serve the better response.

In this case it would be possible for a client to fake a UA, but it's more likely that they weren't aware they were doing things incorrectly and correct the behavior rather than opting in to mimicing a different UA to get the server to behave in a non-standard way.

I haven't seen this happen, and this is one of the most popular DAV implementations. I have seen people fix broken implementations as I've slowly been making the server more strict over the last 10 years.

I read those threads while I was first starting to write my parser.

I found it interesting, if you look in the thread you'll see that this was a big disagreement between Pilgrim and Aaron Swartz.

I know that, a long time ago I wrote an HTML parser that tried to make the most sense out of any HTML you threw at it. At one point, it was used to parse most of the Chinese websites there were at the time to find neologisms.

So it was pretty robust but yeah, somewhere you should draw the line.

I think, as long as it doesn't compromise the design of your program (for example, parsing rfc822 dates with localized weekdays) it's fine to be a bit lenient in what you accept.

Anything that goes beyond, needs a very good reason.