Hacker News new | ask | show | jobs
by k2xl 3504 days ago
I'm hoping nobody actually does this in production. As an academic exercise it is interesting.

Maybe I'm old fashioned - I'm all for flexible APIs and all, but to its point. If a customer sends rotten stuff, it should just be rejected with a 40x code.

At minimum, check to make sure it is proper JSON... I know that a lot of stream processors will put it into a queue and 200 right away and then process in the background, but I don't think that ensuring it is at least JSON and doesn't have a content size of more than X could be too intensive.

In this case, if the data was already accepted and you've got no choice but to deal with it, you've gotta do what you got to do. I've been there, and it ain't fun cleaning up a 900 GB JSON file.

2 comments

> Maybe I'm old fashioned - I'm all for flexible APIs and all, but to its point. If a customer sends rotten stuff, it should just be rejected with a 40x code.

In some fields, that's not an option. I do NMS engineering. If I need to set up monitoring for something, and the only source of the diagnostics I need is an endpoint that returns malformed JSON, I can't just throw my hands up and say "the data's in a shit format, I won't touch it". I'll have no choice but to get my hands dirty and parse out whatever I can because our systems need to be monitored.

I'm lucky in that the only times I had to deal with malformed JSON at this job, I was able to fix the program that was generating it because it was maintained by my team (the problem was that it was snarfing data from a database and sending it out as JSON but forgetting to escape tab characters, and unescaped tabs aren't allowed in JSON), but my luck's gonna run out some day.

I don't deal with such huge files. Honestly, what use case requires 900GB of JSON?
I got one for you. We have to upload json files containing for a bunch of articles some encoded rules, and the legal text in the law why the encoded rules are what they are.

The law part was supposed to be a few lines of text. Except when they dont know which article to give. In that case they provide the full law text, including scanned pdfs, base64 encoded. All 2GB of it. Basically you have something with the meaning null, encoded in a huge string.

Now the creation of this file was given to a third party, who don't bother with finding out the relevant law, and paste the 2GB blob into every article they modify, just to be sure. At this point we have 500 000 articles in that file. We get a new one every month.

Not fun at all. But it is modern, at least, in the past it was a cobol flat file.

This looks like TheDailyWTF.com, but thanks.