| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by rescrv 3290 days ago

I think you're pointing out a good tradeoff here. The original copysets work allows you to explicitly tune the likelihood of data loss with the amount of work done for recovery. A cluster replicated to minimize the likelihood of losing a replica set under correlated failure will have a higher cost of recovery from failure. A cluster replicated to minimize recovery time (e.g., RAMcloud's random allocation) will likely lose entire replica sets upon a set of (even random) failures.

Chainsets were an attempt to add the properties of copysets to a system based upon chain replication.

Working with the original Copysets authors we refined the chainsets algorithm into a tiered replication algorithm that can enforce independence assumptions on the replica set (what you've termed anti-affinity). Here's the paper on the subject: https://www.usenix.org/conference/atc15/technical-session/pr...