| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ploxiln 3784 days ago

The quality of your C code running on a single node is by all accounts really good. But you sometimes speak about distributed systems in a way that is not correct to people who have really spent a lot of time making these things work at scale.

I've done distributed systems, and in fact I usually don't implement systems that are 100% correct under maximally-adverse conditions, for performance and convenience reasons. But it's important to know and acknowledge where and how the system is less than 100% bullet-proof, so that you can understand and recover from failures. When you have 1000 servers, some virtual in the cloud, some in a colocation facility, and maybe 10 or 20 subtle variants in hardware, you will get failures that should not happen but they do. For example, I've had an AWS host where the system time jumped backwards and forwards by 5 minutes every 30 seconds. And I've seen a tcp load-balancer appliance managed by a colo facility do amazingly wrong things.

The point is, you need to understand where the consistency scheme in a distributed system is "fudged", "mostly reliable", "as long as conditions aren't insane", and where they are really guaranteed (which is almost nowhere). Then you can appropriately arrange for "virtually impossible" errors/failures to be detected and contained. This ad-hoc logic you bring to the problem of distributed systems doesn't really help.

Not to "dogpile", but elucidate for others who may not know some of the history here: https://aphyr.com/posts/287-asynchronous-replication-with-fa... and the also interesting reply: http://antirez.com/news/56

1 comments

antirez 3784 days ago

Hello ploxiln, to see systems with clocks jumping is indeed possible, I'll make it more clear in my blog post, but basically it is possible with some effort to prevent this in general, but more specifically the Martin suggestion of using the monotonic time API is a good one and was already planned. It's very hard to argue that the monotonic time jumps back and forth AFAIK, so I think it's a credible system model. The monotonic API surely avoids a lot of pitfalls that you can have with gettimeofday() and the sysadmin poking the computer time in one way or the other (ntpd or manually).

About the ad-hoc logic, in this case I think the problem is actually that you want to accept only what already exists. For example there are important distributed systems papers where the assumption of absolute time bound error among processes is made, using GPS units. This is a much stronger system model (much more synchronous) compared to what I assume, yet this is regarded as acceptable. So you have to ask yourself if, regardless of the fact this idea is from me, if you think that a computer, using the monotonic clock API, can count relative time with a max percentage of error. Yes? Then it's a viable system model.

About you linking to Aphyr, Sentinel was never designed in order to make Redis strongly consistent nor to to make it able to retain writes, I always analyzed the system for what it is: a failover solution that, mixed with the asynchronous replication semantics of Redis, has certain failure modes, that are considered to be fine for the intended use case of Redis. That is, it has just a few best-effort checks in order to try to minimize the data loss, and it has ways to ensure that only one configuration wins, eventually (so that there are no permanent split brain conditions after partitions heal) and nothing more.

To link at this I'm not sure what sense it makes. If you believe Redlock is broken, just state it in a rigorous form, and I'll try to reply.

If you ask me to be rigorous while I'm trying to make it clear with facts what is wrong about Martin analysis, you should do the same and just use arguments, not hand waving about me being lame at DS. Btw in my blog post I'll detail everything and show why I think Redlock is not affected by network unbound delays.

link

ploxiln 3784 days ago

It's fair to criticize my somewhat harsh and yet hard-analysis-free critique. But let me explain why it matters to me, and what the relation to Redis Sentinel is:

"Redlock" is a fancy name, involves multiple nodes and some significant logic, less experienced developers will assume it's bulletproof.

"Redis Sentinel" is a fancy name, involves multiple nodes and some significant logic, less experienced developers will assume it's bulletproof.

I guess I'd prefer that people shoot themselves in the foot with their own automatic failover schemes, instead of using a publicized ready-to-use system which has marketing materials (name, website, spec, multiple implementations) which lead them to believe it is bulletproof, and these systems end up being used in for a bunch of general-purpose cases without specific consideration for their failure modes. In this case, the "marketing" is just the fact that you're behind it, and redis really is solid for what it is.

(Usually I avoid automatic promotion/fail-over altogether. It's often not a good idea. Instead I find some application-specific way of operating in a degraded state until a human operator can confirm a configuration change. So service is degraded for some minutes, but potential calamity of automatic systems doing wrong things with lots of customer data is averted.)

It's really not your responsibility to make sure the open source software you offer to the world for free is used appropriately. But people like me will be annoyed when it contributes to the over-complicated-and-not-working messes (not directly your fault) we have to clean up :)

One specific comment: http://redis.io/topics/distlock notes the random value used to lock/unlock to prevent unlocking a lock from another process. But at that point, the action just before the unlock could very well also have been done after the lock was acquired by another process. So while this is a good way to keep the lock state from being corrupted, it is really a side-note rather than a prominent feature, since if this happens, the data the lock was protecting could well have been corrupted.

link

wyaeld 3784 days ago

I have to disagree here.

"fancy names == people think bulletproof" is not a credible criticism.

Now "Oracle Enterprise Manager Fusion Middleware Control" is a fancy name!!! Pretty sure no-one with a clue thinks a product named like that is bulletproof, and its probably got significant logic/nodes etc

I apparently have the opposite experience to you. I'll often find complex distributed systems that are painful to troubleshoot when they misbehave, and find it refreshing on the other hand when someone is just using Redis, because typically THAT system is working a lot more predictable.

If people are too inexperienced to realize that ALL systems have tradeoffs, and not read up on what those are (because apparenly the name is fancy) then they'll get burned. Antirez does a pretty good job of explaining and documenting where he sees Redis limits to be.

Arguing that everyone should be home-baking their own failovers until of learning the limits of well-known ones our there doesn't seem like responsible advice.

link

querulous 3784 days ago

using the monotonic clock api may not actually be an improvement. that gets you deterministic ordering on a single node but increases possible clock skew (you will still see clock drift but it will now be clamped limiting the ability to bring it back in sync)

also note that you need deterministic ordering over the cluster and deterministic ordering on individual nodes does not actually grant you any greater certainty

i believe you're referring to google's chubby lock system with your reference to gps but the clocks were only used as an optimization in chubby, it still had to implement a consensus protocol

link