Hacker News new | ask | show | jobs
by WestCoastJustin 2614 days ago
Much of this goes into geo redundancy too. These 10M people are not in one place. So, you need redundant highly availably deployments in many regions. This does add lots of infrastructure duplication but is also super speedy for end users. This adds to the cost big time! For example, I was working with a gaming company and they had 10+ regions around the world, all using this type of setup, just to keep latency to an absolute minimum. I'm sure slack is doing the same.
3 comments

There's a big difference, though; they're not all connected. Each team is its own separate entity. A team with 10 people might pay Slack $100/mo, and all be in the UK. Those people and that Slack team's database doesn't need to interact with anyone else in the system. There should be no big scaling constraints here, unlike your gaming company, where everyone needs to be connected from anywhere in the world, at the lowest possible latency.
My company is not particularly big but just in my team of a dozen or so we have members in Boston, Australia, Phoenix, and Seattle, and regularly deal with those in London, Tokyo, etc. It's not that unusual; in fact it's part of the reason Slack (and Hangouts) are so important.
Hm, I may lack the proper knowledge. Why do I need redundant infrastructure? Why can’t I have instances in the cheapest region (300 ms delay is not going to kill anyone in a chat app), and if an instance fails, bring up a new instance, and if the region fails, bring up instances in another region. I don’t see why there should be redundant, idle instances running. Maybe duplicate the database / make it highly available.

I also don’t understand how duplicated infrastructure makes it super speedy for end users when they are from around the world. Yes, they could connect to regional instances, but then the regional instances must synchronize with the other regional instances on the other side of the planet, which gains nothing.

> which gains nothing.

It gains tens or hundreds of milliseconds, especially for channels/chats between people in the same region. You may not feel a "chat app" requires this level of performance, but the improvements there.

Some of this infrastructure duplication ought to go down by that scale, as there are likely a good number of users at any given location.