| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by acejam 3382 days ago
	When that single web server goes down, it's not so "fast" anymore.

2 comments

patrickg_zill 3382 days ago

Agreed, though I keep wanting to take the time to get VRRP working with a web server to have redundancy. OpenBSD uses this to coordinate stateful firewalls with 2 or more systems, if 1 goes down all state info is present on the second node which takes over.

link

titpetric 3382 days ago

Hi, OP author here: I have actually set up a VRRP (well, UCARP) on Docker, so it's possible even to containerize this facet of running a HA ops stack with Docker as the infrastructure. It is however, as you say, it is only used for one active node + a number of fail-overs in case that one goes down. In terms of maintenance (hosts do go down, scheduled downtime is common), it's priceless to have this part of the puzzle portable as well. If you want to check it out, there's a github available here: https://github.com/titpetric/ucarp-ha - and a future article with it is planned as well. It will also become a part of the E-book which I'm currently working on and publishing on leanpub: https://leanpub.com/12fa-docker-golang :)

link

patrickg_zill 3382 days ago

OK so that runs on the host to the Docker instances. Pretty cool!

link

undersuit 3382 days ago

That sounds like a fixable problem. I'm pretty sure Erlang programmers could give some tips.

Why is worrying about a single web server going down more worrisome than some part of the Docker stack going down and causing the same issue?

link

titpetric 3382 days ago

Actually, neither should be a problem if you have enough redundancy :) the hardest part of rolling your own infrastructure is testing mission critical systems (like databases) to be fault tolerant and at the same time reliable. Lots of great projects are out there that address some of these issues, but it takes a lot of attention to details (like transaction rates, ACID compliance, replication, etc.) to get it right. This is why a lot of developers which aren't in unicorn startups take advantage of technology which is available from giants like Amazon or Google, or specific problem-domain companies like CloudFlare for example. Netflix serves as a great example of a technology-driven company that is an inspiration to us, but there are so many others that really changed the way we approach problems - Tumblr, Etsy. But to stay on topic of netflix - I think their idea behind "chaos monkey" is great, and we're increasingly rolling out a (currently simple) docker swarm version of it - https://github.com/titpetric/docker-chaos-monkey - the best way to eliminate worry is to test failure scenarios. As docker chaos monkey is designed to unpredictably "kill off" containers, your system gets the benefit of design to handle failures. It's one of those problems that you have to have a passion for however - it's like testing software. You're only testing software for the functionality and failures which you can predict, and I'm pretty sure that any of us can't predict all the ways in which software (or distributed systems) can fail. As such, it's a never ending occupation. :)

link