Hacker News new | ask | show | jobs
by Diggsey 1423 days ago
This is rubbish, we've run with guaranteed webhook ordering for years, so the idea that you can't do is laughable.

Timestamps don't solve the issue, and neither do "thin payloads" since the receiver has no idea how long to wait before assuming that the order is certain, and if you have a problem on the sender side it could cause logic errors for all of your clients.

Most of these problems are solved if the receiver doesn't process the webhook immediately, but instead queues it internally. You don't have issues with the queue being stalled due to one bad webhook, because there is no event-specific processing happening on the receiver (other than perhaps ignoring some events). The queue can still be stalled if there is a wider problem, but as soon as the problem is resolved, the system can catch up on those queued webhooks, and synchronization integrity is maintained.

Having said all that, if I were to design a new system I would go with a pull-based system instead. In this system, the client would request a range (start time, max count) of events via an HTTP request, and the response would include the "end time" that can be used in the next query. A "webhook" would contain an empty payload, and would simply indicate that the queue had become non-empty - this could be omitted entirely if realtime updates are not required, instead having the client poll.

The advantages of this approach are that it's easy for consumers to "replay" a set of events if they accidentally lose them, and it's also a lot more efficient, since many events can be sent per request (we gain some of this benefit at the moment by supporting "batch" webhooks containing multiple events, but it requires opt-in from the client.) Additionally, it allows webhooks to be versioned more easily, since you can have versioned endpoints for fetching events, and it also allows you to have an arbitrary number of consumers of the same set of events with no additional complexity.

2 comments

Author here.

You obviously CAN guarantee ordering, it's just that you can't guarantee it as the sender, you need cooperation from the receiver. Additionally, putting them in a receive queue on the receiver doesn't solve the issue unless the receiver takes extra care to also read from the queue in strict (non-overlapping) order which is rarely the case, and even then has significant throughput implications. So it really is all on the receiver. This piece was written from the context of the sender.

Timestamps definitely don't solve the issue, I explicitly said to use a centralized sequence number if you must (not a great idea in most cases). Thin payloads: the idea behind that is essentially to use the webhooks as a "please update" kind of notification and then you get the most recent data from the server. Essentially what you called a "pull system", it's a combination of both a push (webhook) to know when to pull, and the pull to get the data. This also doesn't work as nicely in many scenarios (because oftentimes, receivers want the data immediately without having to fetch), but it's good in others.

Please take a look at the content of the article (rather than just the title), I've addressed most of it there too.

I agree with the parent on the weirdness of how the problem is stated in the first place.

On your customer and card example, the issue is not message delivery order but processing order, or more precisely prerequisite satisfaction.

My first thought looking at it was to just store the data of any of the hooks coming in, check the prerequisites each time, and only process the whole when everything needed has arrived.

Trying to dictate order from the sender without any cooperation from the receiver seems like a fool's errand, as in any real world scenario where it really matters, the receiver will also want a way to check it actually received everything in order.

I responded in much the same way below - your job as a sender is not to guarantee that the receiver will do its job in processing but to provide a reliable set of webhook messages so that if the receiver does fail, at least they can discover they've missed or skipped messages or are processing them out of order. As a sender, you certainly can provide guaranteed ordering or a way to identify the order of those messages. What you can't guarantee is that the receiver will process them in any given order if they choose to ignore the ordering you provide.
I understand it may not have been very clear. Though the point is that no one cares about delivery order, what they really care about is processing order. So it doesn't matter if you ensure delivery order if they process it out of order.

As for relying on the customers to get ordering correctly: it's actually more involved and easier to get wrong than people realize, and it's better to avoid it altogether in how you design your API if possible.

Thanks for this, your post resonated with me. It’s good to know that most of what you did is what I ended up doing for a customer implementation (we’re using Odoo and Queue Job to bring in sales from Shopify) and Shopify doesn’t always always guarantee the ordering of their order webhooks payloads.
With pleasure. Happy to hear you found it helpful!
> In this system, the client would request a range (start time, max count) of events via an HTTP request, and the response would include the "end time" that can be used in the next query.

What happens if two transactions commit out of order? tx1 with a lower timestamp commits after tx2 with a higher timestamp has committed - and your client just saw tx2's timestamp.

Or if you have ≥$maxCount number of events changed the same exact timestamp?

The timestamp in this case would be when the message was added to the queue, not the timestamp of the transaction which triggered the event.

If two transactions are non-causal, it doesn't matter which order the events arrive in the queue, but once the message is in the queue, the order is fixed.

> Or if you have ≥$maxCount number of events changed the same exact timestamp?

Use a sufficiently precise timestamp that this doesn't happen, or add a counter in the low bits. The only reason to use a timestamp rather than a simple incrementing counter is to make it more convenient for recipients to re-request historical events (eg. I want to replay all events since yesterday) and to make debugging easier, since with a counter it's a bit meaningless.

The timestamp is not meaningful for the actual event, its only purpose is to specify where this event sits in the total order.