Hacker News new | ask | show | jobs
by notacoward 4442 days ago
Great stuff, but the same generally beneficial approach taken too far can run into its own problems.

http://hackingdistributed.com/2014/02/14/chainsets/

To put it simply, seven million is not a big number, and it's the wrong number anyway. The author confused permutations and combinations; the correct number of four-card hands from a deck including jokers is only 316251. For the more common N=3 it's a paltry 24804. If you're doing "pick any N" to choose replica sets for millions of objects (for example) then pretty quickly every node will have a sharding relationship with every other. The probability of a widespread failure wiping out every member of some shard - leading to loss of data or service - approaches one. You're better off constraining the permutations somehow, certainly not all the way down to the bare minimum, but so that the total probability of data/service loss after N failures remains small.

I really hope people actually do the math instead of just cargo-culting an idea with a catchy name.

2 comments

Author here! Well that's embarrassing, you're right about the 7 million number. I've updated the post to correct it. We'll be following up in a later blog post on the numbers we use for Route 53.
Awesome, thank you. Those of us who waste^H^H^H^H^Hspend our lives pondering these things need to stick together. Cheers.
Off-topic: ^W is more efficient and carries the same connotation of a strikethrough. I only recently found out about ^W myself, so am curious whether you actively chose ^H (character delete) over ^W (word delete), or whether this is something that people don't know about as bash/emacs commands?

I feel that ^H is more widely known because those are the actual characters that people used to see in older terminals before remembering to type "stty erase ^H".

I used ^H because that has become the standard way to indicate a "correction" for humorous purposes. As to why that became the standard, it does go back to the days before Delete keys started going where the Backspace key is supposed to be, requiring that the erase character be set to DEL to compensate. It has nothing to do with emacs.
Isn't the number used later in the article incorrect too?

The article says: "Thus the real impact is constrained to 1/56th of the overall shuffle shards." Shouldn't it be 1/28th? It's 8*7 / 2 since the permutations "shards x, y" and "shards y, x" are the same as far as fault-tolerance is concerned.

Is anyone in this area looking at balanced block designs? There's a rich area of math concerned with placing elements in sets subject to various constraints (e.g. minimizing cardinality of intersections). It's been a long time since I looked at that stuff, but I don't remember how easy it is to produce solutions for arbitrary N, which you would want to be able to increase as demand increases.