Hacker News new | ask | show | jobs
by umarniz 2142 days ago
Interesting read, makes me wonder as a thought experiment if it counts as downtime if the latency of commands on the machines rises to 5 minutes?

You could clone the VM to another instance and record commands going to VM1 and replay them to VM2 after 5 minutes.

This whole brain fart of mine doesn't make much sense but if you play along with it, does it still count as a downtime or just very high latency?

3 comments

It depends on how downtime is defined in the contract.

That sounds like I'm being snarky but I mean it - whether an actual legal contract or just the documentation given to users, any system where downtime matters should have some discussion of what impacts downtime can have and how it's measured and managed.

That documentation is what defines "downtime".

I'll add that what you've described is a sort of low-fi manual version of DB replication (https://en.m.wikipedia.org/wiki/Replication_(computing)).

Wouldn't requests time out on the client side long before five minutes?
I don’t know whether it’s the software in general, but ever since I’ve started using Three 4G broadband in the UK; all of the software started behaving really weirdly (lots of lockups, hangs, etc). Apps often need to be restarted.

If you do a ping during “bad weather”, you can see that they buffer up to 5 minutes of packets (i.e. there will be no communication for some time, then you’ll receive a bunch of them with a huge latency with sequences intact).

So I would assume a lot of software could even work that way. I think a lot of software don’t set any (TCP) timeouts at all.

That works where you have control over all of the timeouts and failure detection at every level and layer. TCP keepalives, for example, could thwart you. Or client side timeouts, or firewall connection state tables, etc.

5 minutes of unplanned downtime in a pub/sub setup could easily go unnoticed, since that setup is typically tuned for long timeouts and/or repeated retries.