Hacker News new | ask | show | jobs
by abcd_f 1754 days ago
> A quick googling indicated a hard drive problem

In the vast majority of cases spontaneous data corruption happens in _transit_ due to RAM glitches.

All modern drives implement forward error correction on per sector basis. This allows the drive to automatically repair up to 10% of damage to any given sector... in which case correct data is returned to the requestor and the sector is either tagged for relocation or is relocated right away. In cases when the data can’t be cleanly recovered, the read request us failed.

That is, chances of a read request returning mangled data from a disk is next to absolute zero. Meaning in turn if you do see data corruption, it happened before this data hit the disk - i.e. it happened in transit.

1 comments

In vast majority of cases corruption (that doesn't involve drive failure) happens due to bad or naive implementation of software that handles files.

While memory does fail sometimes, if it was failing at the rate you describe PCs would not be suitable to any work at all.

I have worked in ops for many years. A lot of software that copies files is perfectly happy to leave you with copy of different length than original.

Software bugs is an altogether different issue. As it is a far more exotic one.

The context of the OPs post and my reply is the case when copying in bulk with a mature tool yields corruption in a small fraction of the data. In this case the cause is the in-transit corruption (rather than at-rest, which is a fairly common belief dubbed as a “bitrot” phenomenon).

Software bugs exotic. That is funny.

File IO is one of more error prone activities with low understanding by general software dev community.

It is actually rare to see a piece of code that is NOT broken.

I had once observed a piece of software write a million files, all empty, because it could not handle situation where it could create file but not write it.

It is quite normal for files to be truncated because written directly to destination file rather than through temp. Then script gets killed and next instance assumes file exists so it resumes from next.

Most software can do the job if everything works, but almost none can correctly handle every error condition.