| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by lobochrome 849 days ago

In those cases, the CPU makes a false calculation independent of what's done in RAM. It can be solved by having flop redundancy as in system z - but nobody at Google or Meta would be considering big metal.

From my point of view, this technology problem may be interesting academically (and good for pretending to be important in the hierarchy at those companies) but a non-issue at scale business-wise in modern data centers.

Have a blade that once in a while acts funny? Trash and replace. Who cares what particular hiccup the CPU had.

1 comments

delroth 849 days ago

> a non-issue at scale business-wise in modern data centers.

I've worked on similar stuff in the past at Google and you couldn't be more wrong. For example, if your CPU screwed up an AES calculation involved in wrapping an encryption key, you might end up with fairly large amounts of data that can't be decrypted anymore. Sometimes the failures are symmetric enough that the same machine might be able to decrypt the data it corrupted, which means a single machine might not be able to easily detect such problems.

We used to run extensive crypto self testing as part of the initialization of our KMS service for that reason.

link

lobochrome 849 days ago

Sure. It’s a cool issue to work on and maybe actually relevant at Google scale. But I’ve asked your colleagues multiple time if the business side actually cared about the issue and they never confirmed.

Again, cool to work on at Google. Not sure anybody else cares. If you care (finance) you fix it in hardware (system z).

link

withinboredom 849 days ago

Why would the business side ever care about technical details? It's like asking the business what days the dumpsters get emptied. Nobody gives a fuck; they just care that it gets done and gets done quickly, correctly, and safely.

link

lobochrome 848 days ago

A CFO knows which factors have a significant impact on the bottom line.

link

withinboredom 847 days ago

If a CFO knows which days the dumpster is emptied, you have a strange CFO. The metaphor is to point out that there’s a lot of technical details that aren’t tracked (like usually refactoring isn’t tracked independently) and shouldn’t be tracked because they are the normal part of the technical job. A CFO can’t measure it even if they wanted to because nobody else is measuring crazy things, like how fast you walk to the bathroom or any other metrics that are specifically related to doing your job.

link