Hacker News new | ask | show | jobs
by onetimeuse92304 886 days ago
I have been using Linux since 1999. I have seen lots of kernel panics. But recently much less, unfortunately replaced by more problems in platform and userspace.

I know Linux works more reliably for some people and less reliably for some others. It probably has much to do what you do with it. What kind of hardware you are running it on, do you just install it and use it as it is or you are the kind of person like me who likes to change everything to his liking.

I also tend to not like to reinstall my machines. For about 15 years my daily driver was a single Debian unstable installation which was continuously updated until I faced too much problems and had to completely replace it. I would have fixed it all but I just did not have the time and I needed it working.

1 comments

My experience is that Linux is rock solid as long as you're not running it on super duper expensive hardware and doing crazy-big things on it.

Randomly in my career so far, notable kernel panic causes were:

- when a spark job finishes and deallocates close to a TB of memory, kernel panic. jobs using below 750GB were typically not seeing this happen, so it was something in there. this just kind of stopped happening after we updated the kernel and spark in a semi-unrelated push, so never really got a root cause here.

- bad hardware

- a spark job that was doing simply insane amounts of shuffle output (which goes to disk) was hitting kernel panics which ended up being related to a kernel bug that only impacted ridiculously high-disk-io-using applications, with some additional spin that made me think "ah so this is basically only affecting spark jobs"

- bad hardware

Did I mention bad hardware? I've spent way too much time hunting down "bugs" that ended up just being a bad mobo and linux was kind enough to inform you of it. But "this is the only program that causes the kernel panics!" and yet when we move it to a temp server for a few days the program mysteriously stops crashing. Another reason I do like "the cloud" - I can just cycle out an ec2 box I suspect is bad instead of fighting with the IT guy about whether the 2 year old expensive server is already busted or not.

Bad hardware is probably the main reason for Windows Bluescreens as well.
Corrupt registry hive, corrupt or missing OS file, or bad drivers are mostly the cause of Windows BSOD. Actually bad hardware is more rare. My experience during my IT consulting days.
I've seen too many systems which started to work fine after replacing a PSU.

As someone who worked L1 and L2 - %he major reason for BSODs is the faulty hardware.

My favourite story on this topic is when after a ~4 human hours of diag by L1 tech, I came to the client site, confirmed the BSOD, opened the case, straithened the SATA cable and the OS installed sucessfully.

EDIT: another one is the cheap PSU cut thr power too fast on the shutdoen, so the HDD never written 'good shutdoen' to the disk, triggering the scandisk on the startup. Fixed with a good PSU, BTW.

I thought the same. Till I bought a surface (first version)... BOY that thing was unstable. Was the last chance I gave to Microsoft. After that switched to Mac. Not coming back anytime soon.