Hacker News new | ask | show | jobs
by adrinavarro 3113 days ago
This looks promising.

I'm currently dealing with a queuing-related issue.

I have a series of tasks running across servers that consume from a queue and run a task.

Often, these tasks die mid-execution (but can be resumed by any other server). So, the queue is a database, and the running tasks "touch" a timestamp in the database if they are still executing. When a database document hasn't been updated for a while, the "consumption query" makes it so that it is 'redelivered' to an available server listening to the "queue".

Of course, this is subpar, but we haven't yet come across an elegant (and not too over-engineered) way to replace this.

5 comments

> we haven't yet come across an elegant (and not too over-engineered) way to replace this.

It's built into some RDBMS. SQL Server has READPAST [1, 2], so you can do:

    BEGIN TRANSACTION;
    DELETE TOP (1) QueueTable WITH (READPAST) OUTPUT deleted.* ORDER BY QueueId;
    -- Do your work
    COMMIT TRANSACTION;
And if your process dies midway through, the transaction is rolled back and the row is immediately visible to another worker.

[1] https://docs.microsoft.com/en-us/sql/t-sql/queries/hints-tra... READPAST is primarily used to reduce locking contention when implementing a work queue that uses a SQL Server table. A queue reader that uses READPAST skips past queue entries locked by other transactions to the next available queue entry, without having to wait until the other transactions release their locks.

[2] https://docs.microsoft.com/en-us/sql/t-sql/queries/output-cl...

Also known as SKIP LOCKED (Oracle, PostgreSQL, MySQL).
Unless I'm misunderstanding, it sounds like your use case fits RabbitMQ perfectly. Pseudocode:

    while True:
      message = queue.consume()
      process(message)
      message.ack()

RabbitMQ will automatically put the message back on the queue when the consumer that pulled it disconnects w/o acknowledging it first. Alternatively, you could explicitly reject messages:

    while True:
      message = queue.consume()
      try:
        process(message)
      except Exception:
        message.reject()
        raise
If you're using Python you might want to check out Dramatiq[1]

[1]: https://dramatiq.io

Not exactly. Tasks can last for weeks. It can run fine for several days and then die, and it needs to be requeued, until it is explicitly finished. In fact, we do use RabbitMQ to emit "status updates".

With RabbitMQ, I'd need to ack rightaway, otherwise it would re-send the message again after a while.

Very curious what type of work you're doing where atomic tasks can run for weeks at a time.
We have something like this, not weeks but days. Linear programs, integer math, using IBM Cplex to schedule "people" to do "things" at the ideal time.
In our case, data migration. Resumable, sometimes dies because of various reasons (or just infra rescaling happening), but takes very long to run.
> With RabbitMQ, I'd need to ack rightaway, otherwise it would re-send the message again after a while.

This is not exactly the case. RMQ will only re-enqueue the message when the consumer disconnects. If you're able to keep the consumer connection alive (this is easy to do with the heartbeat mechanism) for the processing duration, even if it takes a long time, RMQ should handle it fine. That said, if the connection between your consumers and RMQ is flaky, you'll have to make your tasks re-entrant.

Good point. I'd rather have something explicit going on. In other places where we do use RabbitMQ (for short-lived, non-critical tasks), the listening processes log reconnects every once in a while, even with heartbeat.
I've seen folk talk about progress messages for long-running tasks and jobs. If you have checkpointing then they play well together.
We use Azure Storage Queues for this: https://azure.microsoft.com/en-us/services/storage/queues/ It's all done via REST calls but we use their C# API so it's all transparent to us.

An item is queued with a specific visibility timeout (it should take 10 seconds to process so we give it 10 minutes), a job picks up that item and it disappears from the queue for 10 minutes. If the job succeeds, the job explicitly deletes that queue item. If the job fails, the item reappears after 10 minutes for another instance of the job to pick up.

We've been using it since April 2014. There's more information built into the queue item, like the number of times dequeued and original queued timestamp, so we can send trigger alerts if items are getting old.

Jobs expires after 1 week though..
Like others mentioned you can use SQL transaction but either case i used aerospike wherever i needed ActiveMQ or RabbitMQ or ApolloMQ and works like a champ with no issues of the connection / disconnection etc. It is superfast when compared to DB though slower when compared MQ stuff.

Note: I did not like messaging because of these connectio timeouts and also very confusing to get last message etc, -- Not a pro programmer and use vb.net with .net core

Messaging systems with individual message acknowledgements, redelivery, dead-letter features are common in many enterprise systems. If you just need ack/retry ability, then Google's Cloud PubSub is about as cheap, fast and scalable as you can get. Otherwise look at Azure's Service Bus for more complicated routing.
If you're in AWS land, try a step function triggered from SNS. Executions can last up to a year. To shell out to processes that actually perform the work either invoke a lambda or start an ECS task.
Can you go a bit deeper into this please?
Not the GP author, but he's talking about using 4 different AWS services in a particular architectural pattern. SNS topics give you a triggering mechanism to start the long running task. Step Functions give you light-weight flow control and state management, but don't directly perform any interesting work. Instead, the step function steps can invoke Lambda functions or jobs in Elastic Container Service to do the actual work. When they finish, the step function can move on to the next step or retry things as needed.