| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by vonmoltke 2688 days ago
	Bloomberg does it once a year. That said, I have yet to encounter a company more obsessed with business continuity. I don't doubt that their failover systems and testing of them are well beyond the typical.

1 comments

surge 2688 days ago

I think Bloomberg can stand to be down for 8 hours to simulate a disaster. Banks with legacy systems and people constantly dependent on them to conduct business can't risk an actual incident happening because they were testing what would happen if an incident happened.

Netflix designed their stuff from the ground up to fail over. Large monolith corporations who've inherited systems from other companies they've bought or merged with have challenges you won't see many places that have benefited from the 30 years of lessons that were taught at these companies.

link

vonmoltke 2688 days ago

> I think Bloomberg can stand to be down for 8 hours to simulate a disaster. Banks with legacy systems and people constantly dependent on them to conduct business can't risk an actual incident happening because they were testing what would happen if an incident happened.

No, it can't. Any loss of customer-facing functionality is a critical outage ("World Problem" in company terminology). There are a relatively small number of customers, but the terminal is critical to the operations of those who buy it. The terminal going down for eight hours would be a world-wide headline in the financial press.

A Tier 1 test that simulates loss of a datacenter takes a cluster one DC virtually offline. This puts an entire subset of services offline in that DC entirely. The test is coordinated with the teams who own the services to ensure their services fail over correctly. Any service disruption during the failover is a test failure. If it passes, the customers don't even know it happened. The goal is to be able to lose an entire DC and have the terminal customers not realize it until they hear about it on the news.

link

icelancer 2688 days ago

> I think Bloomberg can stand to be down for 8 hours to simulate a disaster.

Do you know what Bloomberg does? It powers equities trading markets around the world, 24/7. It isn't just news.

link

kamikaz1k 2688 days ago

Well that's not true.

Chaos engineering and AWS weren't real things when they started building the company. And the system they have now doesn't resemble much of it was once.

Truth of the matter is they invested more in their infrastructure, but that's because their business plan required them to grow on the back of technological advances. Banks, it's seems, do not. Or maybe they do, and the some of these start up banks will usurp them.

link

ianai 2688 days ago

Standard good practice should be to have a redundancy in place and test it at a regular interval. It should be part of periodic maintenance - fail to the backup so updates/grades can be applied to production and fail back to production once done.

But I’m guessing wellsfargo just doesn’t have a reason to care.

link

closeparen 2688 days ago

Business critical systems can’t afford not to test failover.

You can bail out of a test at the first sign of trouble. When a real outage hits, there’s no telling how long it will take to recover.

link