| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ksmith14 2889 days ago
	The Google SREs mentioned this in their book; the Chubby locking service had uptime that was so high that folks started to neglect making their own services resilient to Chubby failures: https://landing.google.com/sre/book/chapters/service-level-o...

2 comments

robax 2889 days ago

+1 for this book. As a junior DevOps engineer this book has been super helpful.

link

philsnow 2889 days ago

the book is structured in a way that makes it pretty easy to jump around and pick and choose which parts you want to read or skip, so it's not a very large commitment to read it

link

AdamM12 2888 days ago

Mine just came in the mail today. Pretty stoked.

link

mav3rick 2889 days ago

Still that's bad design on the clients' part. E.g. - Just because malloc "never" fails doesn't mean it can't fail :) so better error check for it.

link

Filligree 2889 days ago

Doesn't matter. Engineering around human failure is part of the profession.

link

kjeetgill 2889 days ago

That's a beautiful way to put it. I'd read that book.

link

Filligree 2889 days ago

Well, I'm a Google SRE so...

link

smcameron 2888 days ago

Failure of malloc() might be a bad example to pick because on linux, by default, most distros overcommit, so malloc won't fail, generally. Instead, malloc will succeed allocating the address space just fine, but the RAM will get allocated upon first use, meaning that even though malloc gave you a supposedly valid pointer rather than NULL, actually using that pointer will crash your program.

link

mav3rick 2884 days ago

Other distros may have this differently and return NULL. It's not portable and also just bad to not check for it.

link

tannhaeuser 2888 days ago

Is there a way to fix this/switch it off? I never got the rationale for this behaviour.

link

zorked 2888 days ago

There's a sysctl: vm.overcommit_memory=2

What most people don't realize is that you will get more OOMs if you disable overcommit.

link