Hacker News new | ask | show | jobs
by ransom1538 1519 days ago
"I treat memcached in infrastructure as a very stable service."

I run memcached at a large scale. You are totally right. Every other year we will find ONE bad memcached node down. We use nutcraker instead of mcrouter for consistent hashing to each memcache node. Once i read "We also run a control plane for the cache tier, called Mcrib. Mcrib’s role is to generate up-to-date Mcrouter configurations" -- I was like oooooh boy, here we go....

Knowing memcache is a rock comes with experience though.

2 comments

Our underlying hardware (AWS) is nothing like this reliable. We see regular (several times a year) failure of racks of machines or whole DCs.

Across the whole fleet (all services), we lose 1-10 servers per day as a baseline. Major events are then on top of that and can impact thousand of hosts at once.

What service is this?? This must be huge.
> I run memcached at a large scale

I don't believe you run it at the scale Slack does.

The people at Slack who decided to use Mcrouter (and created Mcrib) have experience running Memcached, Mcrouter and Nutcracker in production at two of the biggest web properties in the world.

Trust that they know whereof they speak.

You may not be wrong, in fact you are very likely right, but this is not an argument.

The larger an org gets the more likely it is to do weird things to mitigate organizational difficulties be them budget, human or otherwise.

Those types of things rarely show up in postmortems for obvious reasons.

"I don't believe you run it at the scale Slack does."

Definitely not. We host about %80 of elementary schools in the US. Not slack scale but definitely face many of the same issues :/