Hacker News new | ask | show | jobs
Three fundamental tricks for developers writing distributed systems (pedro.herokuapp.com)
55 points by pedrobelo 4952 days ago
6 comments

Many times the idea behind distributed systems is to avoid single point of failure. by using Database for communication, you are essentially creating another single point of failure in the form of database (unless database is running on some highly reliable elaborate master-master setup). However you can use systems like zookeeper to get similar functionality and to facilitate communication.
Isn't this the same as a message queue? Why would you want to rewrite this using a database? Also point/trick 2 seems unnecessary if you are using 3 (idempotent jobs). By queuing job ids you now have a consistency dependency between your message queue and database.
When using a non database-based queue you'll have to find another mechanism to make sure your operation is still atomic.

In other words you can end up in a situation where a record is inserted but the job to work on it is not enqueued, or worse - that an insert fails but the job to work on it is enqueued.

Point 2 is still necessary despite idempotency imo: lets say some value is updated to "a" and then to "b", enqueuing two jobs. If the request to update "b" runs before "a" then your receiver will end up with the wrong value. Same if the initial request to update "a" fails.

It seems to me that XA transactions one of those patterns that need to be rediscovered every generation. I see folks start with 0mq or Redis and hit edge cases where messages get lost.

I'd love to see a somewhat simple distributed transaction standard for http api's emerge.

There has to be a middle ground between the fiddly bits of soap's ws-reliable or full on JMS broker and ad-hoc Redis queues.

Perhaps I wasn't clear. What I meant was that if you enqueue the job directly and the job is idempotent, using approach 2 will introduce dependencies like the ones you pointed out.
When using a message queue you're forced/expected to flush messages quickly, this is often not the case.
There are lots of message queues that overcome this. A good example is Apache Kafka (http://incubator.apache.org/kafka/)
Actually I was considering mentioning Kafka when I wrote that comment and contrasting it with RabbitMQ.
The one thing that concerns me about using the database as a queue is that MVCC doesn't really lend itself to writing threading primitives like locks. I'm curious how one would go about writing a queue in an MVCC architecture--off the top of my head, I guess you could have a job assignments table to link processors and jobs, make the job FK unique and interpret forced rollbacks as indicating that another thread grabbed the job before you did. Then again, if your queue is only running idempotent functions it wouldn't matter if you had more than one thread doing the same work, it would just be a waste of time.
This was one of the main reasons explained to me on why db based queues are a code smell.

The other was that insert/select pattern of a queue hammers a db's index and data page fragmentation algorithms. Anyone know if this is still the case or am I repeating a wives-tale?

There are so many factors that go into database performance, but I would expect high frequency queuing would be problematic even today for the reasons you mention. If the load is low it's probably fine.
I believe one can make a stored procedure that implements an atomic enqueue or dequeue operation in a transaction. Still, such systems seem to end up polling, or having triggers that launch non-database activities such as scripts or network activity.

Note that I'm not endorsing this scheme.

Postgres has non-classic-rdbms features suitable for lock/wait sort of behavior for a queue.

I haven't experimented with it myself, beyond to note it's there.

http://www.postgresql.org/docs/9.1/static/sql-notify.html

Prior to 9.1, you couldn't include information in the notification. So you could prevent the listening threads from polling the table looking for work, but you'd still have the race condition. I don't know whether being able to include information in the notification is sufficient to prevent the race condition either. I suppose if you notified with the ID of the entry, and the IDs are evenly distributed across integers you could do modular arithmetic with the thread IDs but you'll have to coordinate among the threads. I also don't know what sort of guarantees Postgres makes about notification delivery.

My intuition is that this isn't enough to bypass the problem, because it addresses the polling side rather than the enqueuing/dequeuing side. But that doesn't mean there isn't a clever way to put this to use.

For a while I was using my own custom written job distribution system built on the database but then I discovered Gearman and since I implemented it I have seen increased reliability and productivity with creating new jobs.

I do not recommend rolling your own system. Use something that is already built as a server to accomplish the task. Amazon SQS is also a good solution.

Just what sort of "distributed" is this talking about, that there's a central DB to put the queue in?
Not advertising that a queue (or any database for what matters) is shared between apps!
What pun?
I think he means "enjoy the benefits of acid" as in LSD. Not much of a pun.