Hacker News new | ask | show | jobs
by SergeAx 1626 days ago
What is your strategy for dealing with server failure?
3 comments

My app allows for "share nothing" architecture, basically using multiple DNS A records as load balancing. Currently it has 6 servers.

Even if 5 of them would go down at the same time, the site would still work as intended (thought probably couldn't handle peak load with less than 3 or 4). If one of or two are down, nothing happens.

Also completely reinstalling a server takes around an hour.

Yes, but comment I was replying to mentioned a 48 euro budget, it is a price of a single server.
I do not have an HA setup BUT all of my proxmox vms are snapshot backuped by proxmox backup server every night to my home NAS and to my office NAS. You can use one of many storage providers. SSHFS also works. This is the cheapest and lowest administration solution I used till today. For production usage I would recommend 3-4 similar speced 28€ machines and run a replicated proxmox cluster or ceph proxmox cluster.
As long as you treat servers as cattle, you can use Hetzner's own load balancer service and then you don't have a SPOF that you manage yourself. Their LBs are advertised as redundant / fault tolerant.
I don't know if it is possible to treat "Serverbörse" servers as cattle. They are all different. I know that k8s and Docker Swarm could in theory balance load between different machines, but never tried it in practice. But I had in practice some weird glitches with different CPU/motherboards/memory.

Also, comment I was replying to mentioned a 48 euro budget, it is a price of a single server.

i think the whole point of vm/container is to enable "servers as cattle"
Yes, but "Serverbörse" machines are very non-uniform, which may be bad for load balancing. See yourself: https://www.hetzner.com/sb
Still you can get a nice server with i7 cpu 32gb memory with 20tb traffic for ~28 euro, vat incl. This is super competitive even with DO or others.
yes, it may be bad for indiscriminate/dumb load balancing. then again this only works in the firstplace if the work units represent a somewhat equivalent and small-ish workload.

once you place value on determinism (in regards to time spent on a task) you want a tightly specced distribution mechanism and/or a feedback loop to communicate busystate back to the LB.

I have 3 servers, 2/3 don't have hardware timestamping on the interface, 1 does. Makes a huge difference when it comes to NTP.
Which hetzner server have hardware timestamping on the interface?

I'd be really interested to know, since to the best of my knowledge, they don't have PTP solutions in their datacenter.