Hacker News new | ask | show | jobs
by dinvlad 1231 days ago
Worth checking if you have any thermal issues with it. Mine failed in a similar way due to presumably a rookie mistake of forgetting to remove the thermal pad tape on the mobo.
2 comments

It's not likely that thermal issues would cause bad reliability on these things. At worst you could expect intermittently bad performance. You can check for this condition with `nvme smart-log`. If your device was often overheated, it would have "critical composite temperature time" non-zero. My Samsung that has been in service for years and has no thermal solution has a value of 1 minute and I happen to know that is because I heated it with a hair dryer to find out what would happen if it crossed the critical temperature.
"I happen to know that is because I heated it with a hair dryer to find out what would happen if it crossed the critical temperature."

Ah this is a fantastic and true hacker mindset :)

Willing to tamper with fairly expensive equipment just for the heck of it.

Ha, interesting! Makes sense, the drive is supposed to just throttle itself before it can reach unsafe temps. I’ll def try to check, didn’t know the drive recorded that - thanks for the tip. In any case, now I know RMA is in order
The controller is less thick than the NAND flash so don't make proper contact with the thermal pad. I just discovered mine is affected by this. After heeavy reading the controller is at 67C while the NAND is at 42C.

https://www.youtube.com/watch?v=I8Z09nU554Q

Hmm, that still seems like it should be ok. Tjmax is usually over 100C (though for NANDs they recommend 70C I think)