Hacker News new | ask | show | jobs
by dpdp_ 5215 days ago
There are so a lot more patterns of high availability architectures other than load balancing.

Distributed Queues, Pub/Sub, Gossip just a few that come to mind.

In your example, you are using what is called a classical three-tier web architecture - a Load Balancer + Stateless Nodes + Scalable Storage. The most interesting part of HA setup in a three-tier web architecture is HA setup of the persistent storage component. It looks like you actually haven't figured that out that piece yet and are waiting for a vendor (Microsoft) to solve this for you.

You can improve availability of your persistent storage (MSSQL) in several simple (or not so simple ways):

1. Use a SQL proxy load balancer (or a cluster setup) - a similar load balancer HA pattern you are already using

2. Shard. You will scale writes and significantly reduce the probably of your system becoming completely unavailable.

5 comments

Disk device level replication like DRBD, plus a failure control framework like Linux Heartbeat, goes a long way in providing HA cluster for database. Since the replication and failover are at the device level, the solution works with any disk-based system, including databases.

In my experience, failure can be detected in seconds and switched over. Adding a reverse ARP setup to share a virtual IP for the clustering servers, the clients won't even have to talk to a different IP in the case of failover. The only case the clients need to handle is to retry upon failed connection, which they should have handled to deal with the occasional network failure.

I found this to be an ironic read. "How to architect for uptime from someone who hasn't quite figured it out themselves"
So should they not have posted at all or should they have used a different title?
If your article on how to architect for uptime ends with how the authors system isn't architected for uptime I don't think they should have written it. How do we know their advice applies once they actually institute the changes they talk about?
This is a really good point and I appreciate your viewpoint, but just because I wrote the article for/about stack exchange it doesn't preclude me from having done it differently at a prior employer.
Talk about that then? It make it clear you're not just talking out your tush some way?
It looks like you actually haven't figured that out that piece yet and are waiting for a vendor (Microsoft) to solve this for you.

Grown-up vendors have had this solved for decades, literally. E.g. http://en.wikipedia.org/wiki/IBM_Parallel_Sysplex

A handy summary of different solutions:

PostgreSQL: Comparison of Different Solutions http://www.postgresql.org/docs/current/static/different-repl...

as well as the rest of chapter 25..

Excellent points! You're quite right that there are a lot more things we could do to improve the high availability in the Stack Exchange environment. Unfortunately, I was hired well after the environment was designed so any suggested changes would need to not only pass the rigors of review by the team but also be able to handle the load that stackoverflow.com generates. Changing pieces at this point would be nontrivial and difficult to get approval on if it meant that we'd affect performance of our main property.

That being said, I'm 100% in agreement with you that the two options you supplied would be wonderful additions to our environment.