| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by macintux 793 days ago
	Jim Gray wrote a classic paper about fault tolerance that I often reference when talking about Erlang: Why Do Computers Stop and What Can Be Done About It? http://jimgray.azurewebsites.net/papers/tandemtr85.7_whydoco...

1 comments

usrnm 793 days ago

> In the future, hardware will be even more reliable due to better design, increased levels of integration, and reduced numbers of connectors

I couldn't help laughing at that

link

sillywalk 792 days ago

He did an unofficial follow-up report[0], based on Tandem customer data from 1985-1989. He mentions the big improvements in hardware (at least for Tandem Computers) were the switch to VLSI logic, hard disks that didn't require any maintenance, and the use of fiber optic connections.

I still find Tandem NonStop Systems interesting, and they're still being sold by HPE running on standard x86 servers.

[0] https://jimgray.azurewebsites.net/papers/TandemTR90.1_WhySto...

link

usr1106 793 days ago

Better design enabling rowhammer, meltdown, and the like...

But when it comes to failures I would bet things must have improved when you measure failure per operation.

Computers did not fail often 30 years ago. If they failed orders of magnitudes more nowadays we would definitely notice.

I have absolutely no numbers on reliability in any kind of metric.

link