| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ransom1538 1519 days ago

"I treat memcached in infrastructure as a very stable service."

I run memcached at a large scale. You are totally right. Every other year we will find ONE bad memcached node down. We use nutcraker instead of mcrouter for consistent hashing to each memcache node. Once i read "We also run a control plane for the cache tier, called Mcrib. Mcrib’s role is to generate up-to-date Mcrouter configurations" -- I was like oooooh boy, here we go....

Knowing memcache is a rock comes with experience though.

2 comments

iamcal 1519 days ago

Our underlying hardware (AWS) is nothing like this reliable. We see regular (several times a year) failure of racks of machines or whole DCs.

Across the whole fleet (all services), we lose 1-10 servers per day as a baseline. Major events are then on top of that and can impact thousand of hosts at once.

link

ransom1538 1516 days ago

What service is this?? This must be huge.

link

muglug 1519 days ago

> I run memcached at a large scale

I don't believe you run it at the scale Slack does.

The people at Slack who decided to use Mcrouter (and created Mcrib) have experience running Memcached, Mcrouter and Nutcracker in production at two of the biggest web properties in the world.

Trust that they know whereof they speak.

link

tempest_ 1519 days ago

You may not be wrong, in fact you are very likely right, but this is not an argument.

The larger an org gets the more likely it is to do weird things to mitigate organizational difficulties be them budget, human or otherwise.

Those types of things rarely show up in postmortems for obvious reasons.

link

ransom1538 1519 days ago

"I don't believe you run it at the scale Slack does."

Definitely not. We host about %80 of elementary schools in the US. Not slack scale but definitely face many of the same issues :/

link