Hacker News new | ask | show | jobs
by mrsilencedogood 887 days ago
My experience is that Linux is rock solid as long as you're not running it on super duper expensive hardware and doing crazy-big things on it.

Randomly in my career so far, notable kernel panic causes were:

- when a spark job finishes and deallocates close to a TB of memory, kernel panic. jobs using below 750GB were typically not seeing this happen, so it was something in there. this just kind of stopped happening after we updated the kernel and spark in a semi-unrelated push, so never really got a root cause here.

- bad hardware

- a spark job that was doing simply insane amounts of shuffle output (which goes to disk) was hitting kernel panics which ended up being related to a kernel bug that only impacted ridiculously high-disk-io-using applications, with some additional spin that made me think "ah so this is basically only affecting spark jobs"

- bad hardware

Did I mention bad hardware? I've spent way too much time hunting down "bugs" that ended up just being a bad mobo and linux was kind enough to inform you of it. But "this is the only program that causes the kernel panics!" and yet when we move it to a temp server for a few days the program mysteriously stops crashing. Another reason I do like "the cloud" - I can just cycle out an ec2 box I suspect is bad instead of fighting with the IT guy about whether the 2 year old expensive server is already busted or not.

1 comments

Bad hardware is probably the main reason for Windows Bluescreens as well.
Corrupt registry hive, corrupt or missing OS file, or bad drivers are mostly the cause of Windows BSOD. Actually bad hardware is more rare. My experience during my IT consulting days.
I've seen too many systems which started to work fine after replacing a PSU.

As someone who worked L1 and L2 - %he major reason for BSODs is the faulty hardware.

My favourite story on this topic is when after a ~4 human hours of diag by L1 tech, I came to the client site, confirmed the BSOD, opened the case, straithened the SATA cable and the OS installed sucessfully.

EDIT: another one is the cheap PSU cut thr power too fast on the shutdoen, so the HDD never written 'good shutdoen' to the disk, triggering the scandisk on the startup. Fixed with a good PSU, BTW.

I thought the same. Till I bought a surface (first version)... BOY that thing was unstable. Was the last chance I gave to Microsoft. After that switched to Mac. Not coming back anytime soon.