Hacker News new | ask | show | jobs
by shiftpgdn 1317 days ago
I have a sun Solaris in my office that was powered up in 1998 and has faithfully served NIS/YP without hardware or software fault since that day.

The modern version of this (kubernetes + AWS/GCP), if designed could likely continue to run for a long long time. Especially a product as simple as twitter.

1 comments

Congratulations, but that is unrelated to what Twitter is doing. How would your Solaris box hold up to half a billion tweets a day distributed in near real time across a user graph with 100M nodes, all while storing those tweets durably and allowing users to search and retrieve a long history of them? It's not simple at all.

Unlike your Solaris box, they are the target of constant advanced hacking attempts. I've been a part of the response when AWS was doing urgent work because of a security incident. The company I worked at was large enough to be paying AWS over a $1M a month when one such incident required dozens of our engineers working around the clock for three days to deal with AWS's response. We weren't even directly involved in the security issue. But without that engineering effort, our product would have shut down. There were other security incidents we were directly involved in and those would have taken us down without an even bigger response (whether or not we were running in AWS).

And then there are hardware failure rates. Hard drives alone fail at a rate of 1-2% per year[0]. Not a big deal on a single box. A very big deal when you have many thousands of hard drives - multiple drives fail every day. Unless you want to WAY over-allocate storage for redundancy. Even with that, there are surprising vulnerabilities to hardware failure at this scale.

----

[0]https://www.backblaze.com/b2/hard-drive-test-data.html

But hard drive failures are why you pay a cloud company with live migrate (ie not AWS) for their service. The physical hardware the machine is running on will eventually fail, as you note, but the VM will keep on ticking on basically forever * and you'd never know the hard drive/SSD underneath it failed.

* Live migrate won't upgrade the CPU family you're running on, so eventually someone/a something on your end will be forced to deal with migrating it, but that's O(years).