Hacker News new | ask | show | jobs
by gtirloni 2538 days ago
In this environment:

Stripe splits data by kind into different database clusters and by quantity into different shards. Each cluster has many shards, and each shard has multiple redundant nodes.

having a few nodes down is perfectly acceptable. I guess they would have had an alert if the number of down nodes exceeded some threshold.

1 comments

this case that doesn't sound like it was the issue, it was the lack of promotion of new master due to the bug in the system in terms of shard promotion.