Hacker News new | ask | show | jobs
by soxocx 959 days ago
> When sending an HTTP GET or HEAD request to LinkedIn for a specific profile [...]

> It will also be returned if there are too many HTTP requests in a single day. This is similar to the HTTP 429 Too Many Requests error message.

Similar? That is excatly what 429 was made for, or not? This is weird or just lazy.

4 comments

I’m assuming the logic is that LinkedIn uses 999 in combatting crawler bots so they don’t want to give reasons why the request is denied. That would help the bot overcome the restriction. e.g. it might be because you’ve sent too many requests but it also might be that your user agent is blocklisted.

Obviously it’s bad practise to do this but I don’t think it’s a mystery why they’d want their denial to be opaque. I’m also having trouble getting that worked up about it. If its OK to return 418 as a joke it’s probably fine to return 999 to suspected crawlers.

The correct status code for that is 400, the generic "something you did is wrong" or 403, the generic "go away".
Sure but I imagine you’d want to track real 400 errors (e.g. your POST request contains the wrong parameters) in analytics to ensure you haven’t introduced a bug in your code. Categorising bot repellant responses along with that would likely be very noisy.

Again I’m not really defending the practise, I think it’s bad, I can just clearly see how they ended up where they ended up.

Isn't that what 40x errors are for? E.g. I think there's 408 Bad Request for your POST example.
The canonical response code for "you did something wrong but I can't/don't want to say what exactly" is 400.

I think if you don't want to supply even that, a better way would be to just close the connection and don't send anything back at all.

The only practical reason for a 999 error code I see is if you want to confuse the client about whether or not the response indicates an error at all. Maybe they were hoping some crawlers treat everything that's not 4xx or 5xx as "success" and so they can poison their index?

That thinking would be relatively naive though, as I think most http clients treat everything that's not 2xx as an error.

So most likely reason is probably some programmer that went through the REST fanboy phase and thought they were special.

> I think if you don't want to supply even that, a better way would be to just close the connection and don't send anything back at all.

I think this is the correct answer if a bad-mannered crawler has been identified. To take it a step further, one could do a HTTP version of a SSH Tarpit[1] although that's likely taking things too far.

[1] https://nullprogram.com/blog/2019/03/22/

Right, but then the crawler devs will google this weird 999 code and handle it as a 429.

If I wanted to mess with clients I don't like, I'd just return a random valid code.

I suspect they intentionally break the HTTP spec for a similar reason: it will break some standard crawlers/bots, stop automatic retries, and things like that. This is trivial to account for of course, but it'll stop some script kiddies.
They might also have desired it for "zero-cost" observability: everything already monitors HTTP status codes, so they can monitor bot traffic without custom instrumentation.
Fun fact: ChatGPT (web version) returns 418 when you are able to get pass cloudflare but still get caught on their end. Very rare though
I believe this is a meme making fun at LinkedIn for using the wrong status code…
I think the solution is to try again. Bad behavior of not following protocols should be rewarded with bad behavior of not following protocols.
Either a 429 in the latter case or a method not allowed (iirc 405) in the former. 429 I kinda get not having since it’s less than a decade old which seems to be the cutoff point for some larger companies, but 405’s been around for a long while.