Hacker News new | ask | show | jobs
by michaelt 2829 days ago

  24x7 coverage with a short time-to-repair costs at
  a minimum several million dollars per year.
Interesting - what are the constituents of that cost?

What sort of challenges do you face? Do you use PTP grandmaster clocks, or something else? How many sites, and how many clocks per site? Are the support issues mostly hardware failures, configuration problems, or something else? Is 24/7 support needed because the equipment lacks failover support, or is the failover support unreliable or insufficient?

2 comments

You generally need at least 4-5 SREs for a high availability large (big 5) scale subsystem in a multinational corp just to cover all of the timezones and make sure you're not frantically calling everyone when someone goes on vacation or has to pick up their kid from the nurse. The salary plus benefits and overhead on that is easily in the millions.
I think it was meant that Google has such high costs. I read somewhere that Google operates two atomic clocks in each of its data centers, but I can't find a source for it right now, just this: https://www.wired.com/2012/11/google-spanner-time/
Atomic clocks aren't all that expensive. You can get a decent rubidium one for US $5K.