Hacker News new | ask | show | jobs
by afavour 963 days ago
I’m assuming the logic is that LinkedIn uses 999 in combatting crawler bots so they don’t want to give reasons why the request is denied. That would help the bot overcome the restriction. e.g. it might be because you’ve sent too many requests but it also might be that your user agent is blocklisted.

Obviously it’s bad practise to do this but I don’t think it’s a mystery why they’d want their denial to be opaque. I’m also having trouble getting that worked up about it. If its OK to return 418 as a joke it’s probably fine to return 999 to suspected crawlers.

5 comments

The correct status code for that is 400, the generic "something you did is wrong" or 403, the generic "go away".
Sure but I imagine you’d want to track real 400 errors (e.g. your POST request contains the wrong parameters) in analytics to ensure you haven’t introduced a bug in your code. Categorising bot repellant responses along with that would likely be very noisy.

Again I’m not really defending the practise, I think it’s bad, I can just clearly see how they ended up where they ended up.

Isn't that what 40x errors are for? E.g. I think there's 408 Bad Request for your POST example.
The canonical response code for "you did something wrong but I can't/don't want to say what exactly" is 400.

I think if you don't want to supply even that, a better way would be to just close the connection and don't send anything back at all.

The only practical reason for a 999 error code I see is if you want to confuse the client about whether or not the response indicates an error at all. Maybe they were hoping some crawlers treat everything that's not 4xx or 5xx as "success" and so they can poison their index?

That thinking would be relatively naive though, as I think most http clients treat everything that's not 2xx as an error.

So most likely reason is probably some programmer that went through the REST fanboy phase and thought they were special.

> I think if you don't want to supply even that, a better way would be to just close the connection and don't send anything back at all.

I think this is the correct answer if a bad-mannered crawler has been identified. To take it a step further, one could do a HTTP version of a SSH Tarpit[1] although that's likely taking things too far.

[1] https://nullprogram.com/blog/2019/03/22/

Right, but then the crawler devs will google this weird 999 code and handle it as a 429.

If I wanted to mess with clients I don't like, I'd just return a random valid code.

I suspect they intentionally break the HTTP spec for a similar reason: it will break some standard crawlers/bots, stop automatic retries, and things like that. This is trivial to account for of course, but it'll stop some script kiddies.
They might also have desired it for "zero-cost" observability: everything already monitors HTTP status codes, so they can monitor bot traffic without custom instrumentation.
Fun fact: ChatGPT (web version) returns 418 when you are able to get pass cloudflare but still get caught on their end. Very rare though