Hacker News new | ask | show | jobs
by stolsvik 1224 days ago
The big point with messaging is that you have rollback, and retries. Mats leverages this.

If Stage N in the total process has picked up a message, starts to process it, and then something (temporarily) fails (or the node crashes), then it will roll back whatever DB operations you have done up till then, and roll back that it has picked up the message.

The MQ will now reissue the same message, and it will be picked up again. This time, things work out, a new message is produced, and the entire processing of this stage is committed.

So, either you have received a message, done your DB-stuff, and sent an outgoing message, or you have done none of that.

I do not see you easily do that with REST. Or at least you will have to code quite a bit to get such nice semantics. With messaging and Mats, you get it entirely for free.

To be fair, it is not entirely true. There are two separate transactions going here, and it can fail in an annoying way. I write about this here: https://mats3.io/using-mats/transactions-and-redelivery/

1 comments

But rollback and retries are specific to the systems executing the commands, not the interface that is invoking them. If you have an RPC based transaction (command1, command2, command3) that fills a list with rollback commands and in case of a failure run through the list of rollback commands that does the same thing as putting them in a queue. If you don't have rollbacks in the system (ie stuff was written in the db during the first command) the queue isn't gonna help with that.
I might not have gotten across clearly wrt. how this works.

It is each stage that is transactional. If the stage processing fails (or the node crashes), both the DB transactiona, and the messaging transaction, is rolled back.

It is then retried. ActiveMQ has a default of 1 delivery, and 6 redelivery attempts. If those 7 fails, the message is assumed "poison", and is put on the DLQ.

But this means that a Mats Flow is a series of transactions, each is individually handled. As you probably allude to, you cannot roll back the entire flow - it is just the particular step that is rolled back. Thus, if the message ends up on the DLQ, you have a mid-way process, where the steps in front are already done and committed, while this stage, and any downstream, are not yet done.

The message is however on the DLQ. If the problem was e.g. an temporary database failure, which now is resolved, you can now just reissue the DLQ (move it from the DLQ back to its queue), and the process will continue as if nothing had happened.