Is hardware (motherboard, CPU, memory) that good nowadays that one can expect it to last 30 years. I don't think it's designed with that kind of lifetime in mind.
we don't exactly run the biggest operation, but in our experience the most common failure items in thousands-of-years-of-cumulative-uptime is network interface cards (or on-motherboard network interfaces) and hdd's.
RAID controllers fail left and right. we keep tons of spares around.
ssd's fail few and far between, cpus basically do not fail, and memory can go bad but it's exceedingly rare and easy to fix. psu's fail but are easy to fix in modern computers as well (slide-out, redundant, etc.)
having said all that, heat is the primary killer of hardware. if you run a lot of equipment in a dense environment, get a laser thermometer and find your hot spots and fix them with some industrial fans or move your gear around. once your stuff gets hot anything can fail in weird and mysterious ways.
Depends on which bit fails, but increases in packet loss are a common early symptom of small components no longer acting within their specs.
Network cards are subject to lots of signal phenomena that are rare inside the chassis. Long cables are pretty good antennas for certain types of RF signals, so there are all kinds of electrical noises, induced power spikes and other miscellaneous garbage that the network card has to tolerate. Well-shielded cables can help protect the card, but it's definitely one interface that's subject to a bit more electrical abuse than the rest.
Components that have been stressed beyond their tolerances a few times can result in things like signal filters having a lower noise threshold, which makes it harder for the card to pick out the signal from the noise, which leads to more packet loss. After enough abuse, the threshold drops below the usable level and communications halt.
There are lots of factors involved, such as shielding, proximity to nearby radiators, bend radius in cables, cable length, temperature, etc, etc. Whenever I delve into this world, I'm often amazed that anything works at all.
failure modes are all over the map. sometimes they just start dropping more and more packets, sometimes it "looks like it's working" but there's no layer 1 link light, sometimes it's incredibly high latency, sometimes the entire card just disappears from view.
this mostly happens with the on-board controllers. nics don't fail as often, but we do use high end nics (intel 10g and 4x 1g)
High-end consumer motherboards often include 2 integrated NICs. Over the last decade I've owned four and had one of the NICs fail after 2-3 years on every single motherboard. Glad to know it's endemic, and Danpat's explanation is fascinating.
> RAID controllers fail left and right. we keep tons of spares around.
Kind of scary. I would guess the replacement should be perfectly identical, to the last firmware bit (... and giving
thanks that no subtle circuit timing factors are involved).
I mean, this is server hardware. One of the major differences between server hardware and desktop hardware is build quality. I've got a ten-year-old 1/2U rack server sitting in a closet that I bought for pennies at a surplus auction that still runs great.
We will probably have to wait 30 years to really know.
FWIW, lots of hardware from ~30 years ago still works. I have a 27 year old Amiga500 that still boots fine (many of the floppy disks have become unreadable, though).
You can buy fully working vintage computers much older than that on eBay.
To my knowledge, the most common guarantee target is 5 years. Equipment can last much longer or much shorter, both as a function of chance and of workload.
30 years would be exceptional if the machine spent its life at the 5-year-target load.
RAID controllers fail left and right. we keep tons of spares around.
ssd's fail few and far between, cpus basically do not fail, and memory can go bad but it's exceedingly rare and easy to fix. psu's fail but are easy to fix in modern computers as well (slide-out, redundant, etc.)
having said all that, heat is the primary killer of hardware. if you run a lot of equipment in a dense environment, get a laser thermometer and find your hot spots and fix them with some industrial fans or move your gear around. once your stuff gets hot anything can fail in weird and mysterious ways.