Hacker News new | ask | show | jobs
by toumhi 5650 days ago
That's desired behavior, although much more difficult in practice than it is in theory. If 20% of your network goes down and you can still serve clients normally, it means that you have a big reserve of machines useful only in case in big outages. I don't know if you can justify it economically.

You can also gracefully degrade performance, by rejecting client connections, disconnecting progressively some clients, accepting loss of consistency etc. It depends how far you can go without infuriating your customers.

We discovered that large-scale real-time systems(in our case, currently 400.000 concurrent connections) are really hard to stabilize against presence storms, network problems and buggy clients, among others.

1 comments

If 20% of your network goes down and you can still serve clients normally, it means that you have a big reserve of machines useful only in case in big outages. I don't know if you can justify it economically.

Just spin more EC2 instances ?

That's an interesting thought: in case of outage Skype could switch from user supplied resources (Supernodes eating users bandwidth and processing) to emergency Skype hosted supernode services.
Yes, if you use an elastic cloud, by all means, spin more instances :-) Most existing companies still have real servers however.
Most existing companies don't run P2P voice chat networks, either. Using EC2 or some other elastic cloud for emergency supernodes makes a lot of sense, since they can outsource the risk of those machines sitting idle to Amazon.
The "cloud" is still made up of real servers ^.^