No, you could have multiple servers talking to each other via something like ZooKeeper. But individually, each node/server would use multithreading to keep track of its state.
I dunno - you've presumably got to implement code to make sure that other servers get unlocked when your locking app server goes down, for example. Personally I try to avoid implementing ad-hoc distributed systems if I can avoid it. Having to shard is pretty damn crappy, but under the circumstances I'd favour it.
Further, I tend to think that if your application is heavy enough to outgrow a reasonably fat db server with (say) 500+GB of RAM, strategies like reimplementing consistent-db-esque locking in an app server + zookeeper setup are also likely to hit some pretty weird performance issues.
> you've presumably got to implement code to make sure that other servers get unlocked when your locking app server goes down
Don't see why you'd need locking. Node A receives a request to buy an item, A asks nodes B, C, and D to confirm that the item is in stock in each node's internal state. If all give the OK, A lets the purchase go through. If node B goes down, then A just asks C and D. I don't see the need for locking or a master/slave.
> a reasonably fat db server with (say) 500+GB of RAM
You have a single point of failure with a single, monolithic server. If its hd dies, you will be in serious trouble restoring terabytes of db.
EDIT: On rereading it occurs to me that I'm getting caught up in poking holes here. The proposed ZooKeeper solution will likely work fine in the simple case, if used with care and attention. It gives me the willies a bit (because I don't think it's a good idea to force users to separate locking from the DB level), but as long as the use case is restricted to compare + set on a single document, there's little reason to think that there's much wrong with it. Apologies if I come across as belligerent.
-----
> Don't see why you'd need locking. Node A receives a request to buy an item, A asks nodes B, C, and D to confirm that the item is in stock in each node's internal state. If all give the OK, A lets the purchase go through. If node B goes down, then A just asks C and D. I don't see the need for locking or a master/slave.
Presumably in order to ensure that B, C, and D can consistently check the object's state, nothing else can be allowed to write to it at that time - A effectively has to lock the object using zookeeper. You then have to make sure that the object gets unlocked if A goes down after locking it, or most particularly if A hangs, keeping the zookeeper session open? Otherwise your other app servers could get caught in a very long or infinite lock wait.
You've also got to keep track of the rest of your code, and make sure it never alters these same objects without locking. I'm assuming here that you only distributed-lock objects when you absolutely have to, in order to improve performance: otherwise you're going to start hitting some of the issues that make it hard to cluster relational DBs with decent performance (modulo some coarser lock granularity for a document-based system).
As you add complexity to this approach (say, perhaps you need to work on two objects at once), you also start having to think about other problems like deadlocking, or what happens if your zookeeper session fails part way through your work, where you've made half of the changes you want to make, and you can't easily roll back. This is all fine if you have programmers who are competent to think about these kinds of problems, but they're typically pretty expensive - and my experience is that most people just don't really bother to think about it.
> You have a single point of failure with a single, monolithic server. If its hd dies, you will be in serious trouble restoring terabytes of db.
Any system of value will of course have a replica, meaning you're perhaps more likely to be worried about network problems than the server falling over. Of course this is still less reliable than a perfectly implemented multi-master distributed system, but it's also hugely easier to use correctly - and the likelihood of failure of a single node is actually very low. Obviously if you literally cannot afford any downtime, maybe you go with a distributed system (or follow the banks and use mainframes..), but businesses that will be killed by network blips are certainly in the minority.