Hacker News new | ask | show | jobs
by kgeist 1182 days ago
>If there's a bug that brings the server down, it will happen in all instances and repeatedly no matter how many times you restart.

Not necessarily so. Many bugs are pretty rare bugs which are triggered only under specific conditions (a user, or the system, must do X, Y, Z at the right moment). So it doesn't happen all the time. But when it happens, the whole server crashes or starts behaving in a funky way and other users are affected. Sure you may say if it's a rare bug, then users will be rarely affected. But we don't have a single bug like that, there's always N such bugs lurking around (we never know how many of them in a large application); multiply it by N bugs and you have server crashes for different reasons quite often, making your paying customers dissatisfied. It also assumes you can fix such a bug immediately while it's not always true, there's often Heisenbugs it takes weeks to root out and fix, while your customers are affected (sure the application will restart but ALL users (not just the one who triggered the bug) can loose work, get random errors when the app is not available -- not a good experience). So having several app instances for backup allows to soften such blows, because there will always be at least one app instance which is available.

>Entropy increases with complex setup. The whole point of not having a complex setup is to reduce entropy and make the system as whole more predictable

I agree that entropy increases with complex setup, but there's also base entropy which accumulates simply because of time (which I think is more dangerous). Like make a sufficient number of changes to the setup of your application (which you often need if you release often) and eventually someone or something somewhere will make a mistake or expose a bug somewhere, and you will need to repair it and you won't be able do it easily because your setup is not containerized which would allow to return to the clean state quite easily with no effort. We've had issues like that with our non-containerized deployments and it's a very complex and error-prone undertaking to do it flawlessly (no downtime or regressions) compared to containerized deployments.

>Plus modern CPUs are incredibly fast and can process several GBs of data per second. Even in the worst cases, you should be able to rebuild all your caches in a second

Hm, usually caches are placed in front of disk-based DB's to speed up I/O, i.e. it's not a matter of slow CPU's, it's a matter of slow I/O. Rebuilding everything which is in the caches from DB sources is not super fast.

2 comments

> and you will need to repair it and you won't be able do it easily because your setup is not containerized which would allow to return to the clean state quite easily with no effort.

Automated deployment including server bringup is orthogonal to using containers or hot failover. For example at $WORK we're deploying Unreal applications to bare metal windows machines without using containers because windows containers aren't as frictionless as linux ones and the required GPU access complicates things further.

Note that you can totally have more than one instance of the same app/binary running on the same machine. You don't even need containers for that.
But then you need some kind of load balancer, which hsn915 said was "too complicated".