Hacker News new | ask | show | jobs
by dcchuck 1012 days ago
Simplest terms - your app needs several different services or instances of a service to agree upon a value. There are a lot of reasons you can't use things like the system clock to agree upon when something happened for example - this is where RAFT steps in.

You'll see "fault-tolerant" and "replicated state machines" often alongside them. Let's break those down in this context.

For "fault-tolerance" - think production environments where I need to plan for hardware failure. If one of my services goes down, I want to be able to continue operating - so we run a few copies of the app, and when one goes down, another operating instance will step up.

In that case - how do we pick what's in charge? How do all copies agree on things while everything is working smoothly? Raft.

For "replicated state machines" - let's stay in this world of fault-tolerance, where we have multiple instances of our app running. In each service, could reside a state machine. The state machine promises to take any ordered series of events, and always arrive at the same value. Meaning - if all of our instances get the same events in the same order, they will all have the same state. Determinism.

This is where it all comes together, and why I think the jargon becomes tightly coupled to an "easy to understand" definition.

You will reach for replicated state machines when you need deterministic state across multiple service instances. But the replicated state machines need a way to agree on order and messages received. That's the contract - if you give everything all the messages in the same order, everything will be in the same state.

But how do we agree on order? How do we agree on what messages were actually received? Just because "Client A" sends a messages "1", and "2', in a specific order does not guaranteed it is delivered at all, let alone in that order.

Raft creates "consensus" around these values. It allows the copies to settle on which messages were actually received and when.

So, you could use other approaches to manage "all your service copies getting along" but a replicated state machine is a nice approach. That replicated state machine architecture needs some way to agree on order, and Raft is a great choice for that.