Hacker News new | ask | show | jobs
by free-ekanayaka 2995 days ago
If you read the raft paper, or just watch presentations of it, you'll see that all the issues you mentioned are covered:

- faulty node replacement: you can take any node off (or any node can crash) at any time. As long as there are enough nodes left to reach a quorum, your system will be available. If there are not enough nodes left, your system will be unavailable (but keep consistency)

- re-syncs, snapshots and the rest are all covered in the raft paper

- clients wanting to perform writes always talk to the node that is currently the leader, that node fails, clients will look for next leader (which will be eventually elected as long as there's a quorum of surviving nodes)

- overwhelming a system is a different concern, raft writes are serialized so the goal is usually not throughput (for that you might look at AP/AC storage solutions in the CAP spectrum, raft is CP). Designs that need high write throughput might still use raft as internal building block for coordination.