| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by bhaak 3504 days ago

Great, after the tag soup of modern browsers are we now also going to see json soup?

Sometimes it's obvious what's wrong with malformed data you receive. A classic would be encoding errors.

But as soon as you start supporting broken components and APIs, you will never be able to unsupport it.

Prime example would be HTML. Granted, in the beginning, it was supposed to be written by humans but that was rather quickly not a major obstacle anymore and even a human can produce valid HTML with the help of a syntax checker.

1 comments

drakenot 3504 days ago

I've written a relatively popular Atom/RSS feed parser for Go [0].

I struggled with this very issue but I ultimately ended up attempting to be robust against out-of-spec feeds. A super strict feed parsing library is less useful than one that can successfully parse certain classes of broken feeds.

It is a fine line to walk -- I won't add a great deal of complexity to support overly broken feeds, but if it is relatively simple to support certain types of common mistakes I'll do it.

[0] https://github.com/mmcdole/gofeed

link

treve 3504 days ago

I'm doing this with WebDAV too. When I come across a bug that's clearly an implementation problem I weigh how prevalent the software is, how likely they will be able to fix it and if possible I add a user-agent specific workaround so new clients can't rely on the same bug with my server.

link

kr0 3504 days ago

But then we add the IE-nightmare of using an accepted user-agent in a new product to workaround cases like this

link

treve 3503 days ago

That nightmare had to do with misbehaving servers. IE had to advertise as Mozilla so servers would serve the better response.

In this case it would be possible for a client to fake a UA, but it's more likely that they weren't aware they were doing things incorrectly and correct the behavior rather than opting in to mimicing a different UA to get the server to behave in a non-standard way.

I haven't seen this happen, and this is one of the most popular DAV implementations. I have seen people fix broken implementations as I've slowly been making the server more strict over the last 10 years.

link

markrages 3504 days ago

Nothing new under the sun:

http://www.xml.com/pub/a/2003/01/22/dive-into-xml.html

link

drakenot 3504 days ago

I read those threads while I was first starting to write my parser.

I found it interesting, if you look in the thread you'll see that this was a big disagreement between Pilgrim and Aaron Swartz.

link

bhaak 3504 days ago

I know that, a long time ago I wrote an HTML parser that tried to make the most sense out of any HTML you threw at it. At one point, it was used to parse most of the Chinese websites there were at the time to find neologisms.

So it was pretty robust but yeah, somewhere you should draw the line.

I think, as long as it doesn't compromise the design of your program (for example, parsing rfc822 dates with localized weekdays) it's fine to be a bit lenient in what you accept.

Anything that goes beyond, needs a very good reason.

link