Hacker News new | ask | show | jobs
by nh2 604 days ago
You are right that anything that needs up to 50000 atomic, short-lived transactions per second can just use Postgres.

Your UPDATE transaction lasts just a few microseconds, so you can just centralise the problem and that's good because it's simpler, faster and safer.

But this is not a _distributed_ problem, as the article explains:

> remember that a lock in a distributed system is not like a mutex in a multi-threaded application. It’s a more complicated beast, due to the problem that different nodes and the network can all fail independently in various ways

You need distributed locking if the transactions can take seconds or hours, and the machines involved can fail while they hold the lock.

2 comments

You could just have multiple clients attempt to update a row that defines the lock. Postgres transactions have no limit and will unwind on client failure. Since connections are persistent, there’s no need to play a game to determine the state of a client.
Your scenario still uses a centralised single postgres server. Failure of that server takes down the whole locking functionality. That's not what people usually mean by "distributed".

"the machines involved can fail" must also include the postgres machines.

To get that, you need to coordinate multiple postgres servers, e.g. using ... distributed locking. Postgres does not provide that out of the box -- neither multi-master setups, nor master-standby synchronous replication with automatic failover. Wrapper software that provides that, such as Stolon and Patroni, use distributed KV stores / lock managers such as etcd and Consul to provide it.

> up to 50000 atomic, short-lived transactions per second

50000?

> You need distributed locking if the transactions can take seconds or hours, and the machines involved can fail while they hold the lock. From my experience, locks are needed to ensure synchronized access to resources. Distributed locks are a form of that isolation being held across computing processes, as opposed to the mutex example provided.

And while our implementation definitively did not use a distributed lock, we could still see those machines fail.

I fail to understand why a distributed lock is needed for anything due to it's duration.

Mostly guessing but -> duration is usually inversely correlated with throughput.

If you require high throughput and have a high duration then partitioning/distribution are the normal solution.