Hacker News new | ask | show | jobs
by dc_gregory 2670 days ago
I'm inexperienced in the hardware front, would that machine likely not break down under a large load for 3 years straight? Nothing is set aside for hardware failure etc.
7 comments

I have 30 machines running at 100% load for an hour, then 25% load for an hour, the pattern repeats. Every quarter they run 100% for about a week.

Every 6 months or so a hard drive fails (out of 16 per server), no other components have failed. 10 machines are 6 years old (test system as it's out of warranty), 10 3 years old, 10 new.

There's also 20 or so other servers under different loads, I've not had anything fail other than hard drives.

When I started the job, there were spare power supplies for some 10+ year old servers in storage, so I'm either lucky or reliability is improving.

Our on prem server has been running for about 7 years on a high load. Its a little unpredictable but I would expect a server to last longer than 3 years
No, not unless its badly designed.

I used to work for a VFX company where we would try and get as close to 100% utilization out of the farm.

We had machines that were 4 years old, still merrily plodding away. (not too many mind, they ate more power than its worth.) The things that tended to fail were Harddrives and fans.

Depends what the warranty is, I think three years is standard and you can pay for five or seven. (You pay commensurately, however.)
Ah, excellent point, a warranty is something I completely overlooked. Probably some value "lost" doing it yourself due to downtime if something goes wrong, but likely negligible.
As long as it's run within its thermal limits (i.e. not being severely overclocked and has adequate cooling/ventilation) 3 years or even 5 years isn't an unrealistic lifespan. The analysis makes this, and a lot of other, implicit assumptions which in general seem reasonable.
On the other hand, they didn't calculate in any residual value after 3 years.
Because it's negligible
But isn't that $69,000 peanuts compared to the cost to hire sysadmins who are on call 24/7 to swap out: RAM, fans, drives, power supplies, provision new images, etc.?? So you saved on cloud costs, but now you have the admin burden: 1-3 people for $200k each. No?
From having managed several racks worth of equipment in two separate data centres 1-2 hours travel from my office at the time: I cost closer to that $200k/year, but I also only usually visited the data centres 2-3 times a year, and other than that we used "remote hands" at the colo to do maintenance and be on-call 24/7. On ~60+ servers, we had maybe on average one minor incident every couple of months that required physical intervention.

Between two data centres, lets assume I did 6 visits a year, and that we had one ~30/min incident a month at $50/incident (it was less, but I don't remember the exact details, and it doesn't matter for this exercise). Let's assume I lost a whole day every visit (I didn't, though it got close at times), and "charge" $1000/day for my visits. That adds up to $6k/year for my time, and $600/year for remote hands, or ~$110/year per server. For comparison the colocation cost us ~$17k/year, or ~$283/year per server. These costs were pretty stable by number of servers, and so favored using fewer, more powerful servers than we might have otherwise.

So that added the cost of renting space at a manned colo facility instead of having the servers in the office (we did have a rack of servers that didn't need 24/7 attention at our office as well).

The rest of my time was spent on devops work that in my experience tends to be more expensive (on the basis of having contracted to do this kind of work on AWS too, and know the difference in billable hours I'd typically get per instance on AWS vs. per physical server on colocated setups) on cloud setups because complexity tends to be higher.

> costs were pretty stable by number of servers, and so favored using fewer, more powerful servers

Didn't this make each failure a bigger hit to your overall capacity? How much redundancy did you have? I used to work in adtech with colocated hardware, and it was old and failed a lot, but they had enough it didn't matter ("we're down 5/120 in Germany but we can swap them out while we're there next month").

In that case it was ~60 servers, so losing any single server made little difference, but yes it of course needs to be a consideration if your number of servers is low enough.

It was also a fully virtualised setup that could also tie in rented dedicated servers or cloud instances via VPNs as needed. So where it made sense or if we had an urgent need, we had the ability to spin things up as needed.

E.g. we had racks in London, but rented servers at Hetzner in Germany (Hetzner got close to the cost of the colocated servers, though mostly because rack space in London is ridiculously expensive; it might actually have saved us money to put servers in their colo facilities in Germany, even with the cost of travel to/from them occasionally)

This sounds like a great use case for AWS -- as cheap insurance. Just setup a VPC and VPN and if you have hardware fail just spin up an instance in AWS until you can replace your physical hardware. Pay $40/mo or so to keep the VPN active.
From the article: "Our TCO [(Total Cost of Ownership)] includes energy, hiring a part-time system administrator, and co-location costs"
A lot of times when you colo you can get staff there to take care of those sorts of tasks. Their cost breakdown includes using this service.
If you have only a single server, and you need 24/7 support, then probably it is true you don't want to hire a full-time sysadmin for one server. But, for the people who are doing this kind of thing, they probably have more than one, so the cost of sysadmin is spread across more than one server.
> Our TCO includes energy, hiring a part-time system administrator, and co-location costs. In addition, you still get value from the system after three years, unlike the AWS instance.

They covered that I think.