Hacker News new | ask | show | jobs
by rubayeet 2079 days ago
Whoa! This is a game changer! I was looking into both Kafka and Google Pub/Sub for a event-oriented system my team was designing. Google Pub/Sub looked very promising, but no guarantee of ordering was a deal breaker for us. I’ll consider this more strongly for the next system we build.
4 comments

We used Pub/Sub very extensively (50B messages a day) but moved to Pulsar [0]. It performs equally well and has some nice features. And also no vendor lock-in.

[0] https://kesque.com/billions-of-events-a-day-without-breaking...

Pulsar seems operationally quite complex, as it has a dependency on both BookKeeper and ZooKeeper (which BK also needs). ZooKeeper is particularly notorious for being difficult. What's your experience been like?
It definitely is on the more complex side of management. That's why we partnered with Kafkaesque to do the maintenance for us. We were fine handling it ourselves but decided to outsource it as it's less critical for us than many other internal tasks.

They have an open ticket [0] to dilute zookeeper's dependance, but as far as I know it's still pending.

[0] https://github.com/apache/pulsar/issues/572

Given cloud vendors low respect for their customers, the proper stance is to own your own infrastructure.
The company I work for had the same stance 5 years ago. We regretted it a hundred times and now we stand the gaff. Nothing better than maintaining an EC2 based Cassandra cluster instead of simple using DynamoDB, huh...
I don't believe I advocated for the use of Cassandra. Owning vs being owned by, we are not arguing about the same things.

Stances are not strategies, when we use another's API we form a bond but the the other is free to break it so we are automatically at a weaker position. We have acquiesced. But if we choose a strategically worse choice, we have not only acquiesced, but done self-harm.

I understand. I’m just saying that in my my professional experience, nothing good came out of religously avoiding lock-in but the opposite.
With respect, clearly you cannot extrapolate that particular experience very far.

Otherwise, eg, FOSS is "nothing good".

I agree with this, the best parts of the cloud is abstracting away a lot of the basic maintenance of these applications. There is lock in to some platform no matter what you do, but hopefully you can design your application so that if you do need to rearchitect you can do it in phases.
What's a use case where strict ordering is critically important?
"What's a use case where strict ordering is critically important?"

In general, as the use case grows, every use case where the developers did not make explicit and careful provision for ensuring that order is not important, with quite non-trivial effort.

Even a lot of systems whose developers think they have no ordering dependencies are wrong in at least one subtle way without realizing it.

If you need to megascale, you're going to have to bite the bullet and build a system that can handle out-of-order, but there's a lot of systems out there where you don't need megascale, and you can get rid of that "quite non-trivial effort" to deal with out-of-orderness by asking for messages to arrive in order.

To get a sense of just how useful that can be... bear in mind that every time you open a TCP socket instead of a UDP one, you just made exactly that choice, to use an ordered message system when you didn't "need" one. Take a look at everything you do with a TCP socket and think about trying to run it over UDP, and not with something like QUIC that basically adds half of TCP back on it, but with UDP straight-up. That's what kind of things can use in-order delivery... lots of things.

Almost everything can be simplified by guaranteed in-order delivery. It's just that some things can't afford the downsides.

I can understand your post, but I don't quite buy the TCP thing. I don't think anyone is using TCP for ordering, they're using it because they don't want their packet dropped.

I guess all of the systems I build are just built to assume no order/ or to leverage causal ordering, because that feels much easier to reason about - enforcing ordering feels really hard, and like something that a message bus can only do some of the work of.

> I can understand your post, but I don't quite buy the TCP thing. I don't think anyone is using TCP for ordering, they're using it because they don't want their packet dropped.

Think of (almost) any modern protocol built on top of TCP, and you'll see that ordering is critical. (http, smtp, telnet/ssh, etc.)

Gotcha, ok so specifically protocols built on TCP.
There's no ordering in HTTP. If you could send a whole HTTP request as a UDP packet you'd get exactly the same protocol (obviously sans WebSockets - but you could work around that).
There are numerous valid HTTP payloads that are larger than a single TCP/UDP packet.

Ordering is important in these cases.

How would you process an unordered response?
I'm not sure that ordering doesn't matter for most TCP data. For example, HTTP depends on ordering. Any time you are transmitting messages larger than the size of a packet, you need some degree of ordering, even if it's only to reconstruct the individual messages, when you don't care about the order of the messages.
Consider processing two events without guaranteed ordering:

- Create A

- Delete A

In one ordering, A is created and then deleted as expected, in the other, the delete fails but then A is created and remains.

There are a couple of options without needing guaranteed ordering:

- jobs can have ever increasing ids, workers record the last seen id in one place, and ignore jobs with ids less than last seen

- job results are returned for each job to a supervisor. if a job result doesn't match current expected state, resend job. jobs should be idempotent in case a job is sent multiple times

If the create job is expensive, the latter solution would be less ideal, though.

Just ignoring tasks isn't helpful, it means you still process things out of order and just never get to the previous pending item in that case.

Much easier to just have strict order and process it off the line as it comes.

Creating ever increasing ids reliably at scale is not trivial. You will probably end up having a single server generating these ids which will then become a single point of failure.
(Or a distributed system - this is no trickier to migrate away from SPOF than anything else with global state.)
But when does the delete enter the queue and when does the Create enter the queue ;)
That doesn’t necessarily require that the entire queue is totally ordered, but the alternatives (such as Virtual Synchrony) are still considered arcane / research topics.
Kinda interesting, I built this just yesterday but with an async "buffer" and "write", and I just used a simple incrementing identifier system.
Does adding timestamps not handle this case?
Now you need to queue up events for some time, reorder them using the timestamp, and then process them. It’s possible, but has overhead in both performance and custom code you’ll have to maintain. If there is no guarantee of order, two separate systems consuming those same events also might get different results, depending on the implementation, that can be problematic
For a single process on one box with one thread you can use something like that.

If you involve more than 1 box that goes out the window. Sometimes you can still get 'one timestamp' by making something else the owner of the timestamp. It also depends on your resolution of time and the process that does the ingesting. For example if that ingest process has more than one thread to handle things you can still get out of order/sametimestamp if not coded correctly.

If events are generated by different processes you cannot really guarantee that time is exactly the same for them, unless you do something fancy to ensure that.
Interesting. The ordering here is when the event was generated or when the event entered the queue ? I think the later and so I think the examples here don’t apply without something on top and a trade off
The queue entrypoint is not always the same process either, especially in a system like Pub/Sub.
With a single producer and consumer, yes - but of course that's seldom the case.

With multiple producers and consumers, clock skew would be an issue, with the time on different machines being off from each other slightly.

One option is to use a single source for generating IDs, but that introduces another failure point, and comes at a hefty performance cost.

> Does adding timestamps not handle this case?

If you have one message source (a single thread or some kind of coordination), and the messages have lower frequency than the timestamp resolution, yes.

The farther you get from that, the more the answer is no.

Just make the source part of the timestamp?

(1,A) < (1,B) < (2,A) etc.

Also use serial numbers instead of true timestamps.

That way lies madness, and/or eventually accidentally writing your own Dynamo-paper database.
That won’t work in all cases. For instance, if you get messages from devices which can be reimaged they may have clock skew in a period of time before they’re synchronized again.
But in any case you can't rely on the order of message ingress to your system to represent anything meaningful either? It would have to ensure that the key for defining order would have some hard logical ordering purpose for which time is not relevant or useful.
The order of message ingress can still be meaningful even if device clocks are skew or jump due to rebooting, reimaging, network time sync, frequency drift, etc.

A hard logical order arises from interactions. E.g. if the device receives a message, does something locally, goes through a clock change, and then sends a message dependent on one it fetched earlier, that's a logical order with out-of-order clock.

Or if a device gets a message, processes, sends something to another device, that one processes too then sends another message back to the original source, there's a logical order but with three different clocks. Even if the clocks are synchronised, there will be some drift and the messages may be processed fast enough that the drift puts their timestamps out of order.

every use case where you haven't proven it isn't.

human conscious thought is local and single-threaded. it takes a lot of experience and training to be able to intuitively reason about non-local multi-threaded computation. if you're smart and humble you can try to simplify the problem by making individual messages independent from each other by e.g. employing redundancy but you still have to be aware that it's even a problem to begin with.

I see it a lot when integrating legacy healthcare systems that effectively operate on state-transition queues with an assumption of in-order processing.
Financial transactions (withdrawing/depositing money or placing stock orders) Credit card/payment usecases
What did you pick? I'm interested in non-G solutions.
Probably Kafka according to his comment.
If you are still looking, I would recommend Solace PubSub+: https://solace.com/products/event-broker/software/

Supports zero message loss, no headaches around topic partitions, in-order messaging, support for open apis/protocols, in-memory AND persistent quality of service, support for event mesh etc