Hacker News new | ask | show | jobs
by byteofbits 2087 days ago
This is definitely a surprising, if welcome, development from GCP. We used to be pretty significant users of Pub/Sub but migrated to Cloud Tasks after several discussions with our account manager indicated this wasn't the direction they wanted to go with PubSub.

The implementation here also seems to be somewhat unusual at first glass - in particular that a retry of any given tasks appears to also retry any subsequent message with the same "ordering key".

I do wonder what use cases this is targeted that wouldn't require pretty extensive work on the application side to ensure good idempotency. Does anyone have any ideas of problems this would solve in and of itself?

2 comments

The ordering keys feature supports a large number of keys (though since the throughput limit is 1 MB/sec per key, many applications shouldn't have issues scaling up on a given key).

Imagine you have an order processing system where you have to 1) write to a database 2) write to a metrics log 3) and send an email to the customer. You can publish a message with the ordering key being the user who initiated the order. This means you are guaranteed to see message 1 before 2, which is seen before 3.

You do have to account for possible message re-deliveries. In this example, you can 1) write to the database with a order's unique ID (to prevent duplicate rows) 2) be fine with duplicates for metrics since a bit of duplication is okay (or maybe you have a job later that removes duplicates offline) 3) and be okay with sending emails to customers twice (pretty harmless). You may also keep a side-cache of processed messages to reduce the processing of duplicates, but that's a bit heavy and may not be necessary.

What Cloud Pub/Sub with ordering keys gets you in this scenario is 1) durability of published messages 2) scalability across keys 3) ordering between messages in a key 4) retries in case one step fails 5) buffering in case your subscribers are slow or down 6) a fully hosted service (no dealing with your own cluster, scales automatically) 7) global availability (no need to shard your subscription by region, simplifying your app).

Disclaimer: I work on Cloud Pub/Sub, but this explanation is my own.

Thanks for this explanation. Can you give a similarly concrete example of how, according to the docs: "When you receive messages in order and the Pub/Sub service redelivers a message with an ordering key, Pub/Sub maintains order by also redelivering the subsequent messages with the same ordering key. The Pub/Sub service redelivers these messages in the order that it originally received them." I'm a little confused about what scenario with ordering would lead to the need to re-send multiple messages.
If you're using pubsub to propagate state, then this lets you replicate with eventual consistency. (Think db replication, instant messaging, video game scores, etc.)