Hacker News new | ask | show | jobs
by hcm 4661 days ago
Very interesting. How exactly does the retry-later logic work? Does it just push the message on to a 'deferred' queue, that you manually process at your convenience?

Totally agree with your last paragraph, introducing a message queue has solved a lot of problems for us.

1 comments

RabbitMQ queues support two features that can be combined to implement a deferred retry queue, and RabbitMQ will do all the work for you.

The first is a "message-ttl". This tells RabbitMQ to discard messages after a specified number of milliseconds. The second is a "dead letter queue". Messages that are discarded from a queue can be routed to a dead letter queue automatically.

When we have a job that we wish to "retry later", the framework re-queues the message in a secondary queue with a name derived from the original name. For example, if the original queue was "prod-emailer", the derived queue name might be "prod-emailer-1m" indicating that the contents of this queue are messages originally bound for prod-emailer but were delayed by 1 minute.

This delayed queue is configured with a x-dead-letter-exchange of the original exchange, x-dead-letter-routing-key of the original routing key, and x-message-ttl of 60,000. With this configuration, RabbitMQ handles the timeout automatically. When the message expires from the -1m queue, RabbitMQ sends it back to the exchange and it gets routed to the intended queue by the pre-existing bindings.

The framework expects all messages to be in an "envelope" of JSON which lets us annotate the jobs. When we mark a job for retry, we also increment an "attempt-count" attribute in the JSON. The workers can them implement their own "retry N times" policies.

I haven't thought about how this would work if we were using topic exchanges. We are only using direct at the moment.

I wrote a similar system for C# - we had three queues per application, work, delay, and error. In our system, the deferred queue used per-message TTLs that would push messages back onto the work queue. This allowed us to inspect the deferred and error queues while the application was running.
Thanks for posting this. Some of these ideas could work very nicely in hutch.
No problem. Frustratingly, people rarely seem to talk about how they use RabbitMQ in practice. Also, there are a lot of things that are still a mystery to me (for example, best practices for dealing with things like server shutdowns, or what to do when a NACK "fails", etc).

I need to poke around Hutch and steal some ideas for an RPC implementation. We're just using fire-and-forget type work for now, but I'd love to be able to use RabbitMQ w/anonymous reply queues as a workaround for PHP not being able to do asynchronous RPCs.