Hacker News new | ask | show | jobs
by iddqd 3494 days ago
Maybe webhook providers could provide an endpoint where one could poll for events that failed to deliver.
1 comments

The good APIs do, but it's still at a loss to both sides.

a) The producer of the events has to store them in semi-permanent storage. I've been there and done that - failed webhooks result in a table of tens of millions of rows, even if the memory on each event is only 48 hours. It's astounding how many events fail to process. And I've been through extensive verification that there is truly no problem on our side - it's always the client who is wrong. Emails back and forth for weeks with the client screaming "it's your fault!" - only to finally receive an "oops, we found the problem on our end... sorry".

b) Frankly, if the consumer of the events fails on a single webhook more than 5 times in a 24 hour period, that event is a permanent loss. The reason it fails consistently is because that specific event is a permanent failure to process on the consumer's side. It is probably throwing a 500 Internal Server Error or similar - every single time. 0.001% of webhook consumers actually have emergency alerts when webhooks fail on their end, so the job will continue to throw a silent/unlogged/unnoticed/ignored error no matter how many times you retry. These are the same type of developers who will never poll your "failure queue", because they don't even understand that their consumer endpoint throws 500 Internal Server Errors on 10% of your requests. You're trying to provide a service to developers that live in a fantasy world where errors and exceptions never happen on their end.

It's a simple fact that developers who consume webhook requests are a disgrace. Chances are that if a request fails two times, it will never succeed. And yet the best APIs will try hundreds/thousands of times over a 24 hour period - simply to prove to that client that it is their fault that they are not processing webhooks properly. There is only so much a webhook producer can do. There is no magic we can do if the consumer is copy/pasting PHP snippets from Google or Stackoverflow.

Story time. The most memorable situation I can remember is a client who was experiencing 100% webhook consumer failure for more than three weeks. The emails from their team - and subsequent phone calls from their CTO - were absolutely stunning; it got to the point that we were hounding our own business people to drop them as a client, the verbal abuse was that bad. Turns out they had a bunch of PHP developers who were for the first time writing their consumer webhook endpoint in C for some reason. They were trying to parse the custom "id" field that they sent us as a string in a JSON field, as an integer. It was all because they sent us a string, and choked on trying to re-interpret it as an integer. It hurts to even think about that case.

tldr; Fuck webhook consumers. Incompetent developers who don't know how to handle errors that are 100% their fault.

Funny aside: the most amusing cases come from PHP and .NET developers who expose their internal server errors in production. When you can copy/paste the response they gave you on a webhook because they are calling an undefined function or method... pure bliss.

You could also help customers who apparently have trouble properly connecting to your APIs by giving better error returns (got type A, expected type B), providing client libraries or giving more extensive support (for a price). Blaming the customer is easy, providing a way for even those "incompetent developers" to interface with you in a way that is easy to understand and debug for all parties is hard.
The truly great developers find a better way than only retrying webhooks and prepare a client library that the customer can just plug in to their code :-)