| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by nicky0 2828 days ago
	Care to enlighten ignorant me why we would want that?

3 comments

giobox 2828 days ago

Simple, to help prevent “bit rot”. The problem is exacerbated further in that many of us treat cloud sync services as backup, which they arguably aren’t - they can inconveniently just spread the decay.

I’d also hoped that a next generation file system from Apple would have had more to say on this topic, but it seems like features that promote their iOS device agenda took front seat over less “sexy” features like data integrity.

In the days before iOS devices dominated OS level decision making at Apple there was an assumption that Apple might adopt ZFS as their next generation file system, which is apparently much better in this regard. There’s various evidence of a cancelled MacOS ZFS project scattered throughout past MacOS releases.

> https://en.wikipedia.org/wiki/Data_degradation

> https://arstechnica.com/gadgets/2016/06/zfs-the-other-new-ap...

link

josephg 2827 days ago

Word on the street is that Apple's ZFS integration was mostly finished and it was going to be announced at WWDC. Sun opensourced ZFS under the CDDL. But then Oracle bought Sun, and Apple's lawyers wanted to make sure Oracle wouldn't try to sue them over ZFS somehow anyway. Negotiations between Apple and Oracle for a clear ZFS license fell through. Without legal go-ahead the feature was pulled from macos at the 11th hour and buried.

When ZFS was opensourced under the CDDL, lots of people complained that they should have chosen a clearer, more permissive opensource license. Other people said it was fine, because the license was good enough and Sun is full of good people. The way everything played out, its clear the first group's concerns were valid.

Its a huge shame. ZFS is a fantastic piece of engineering. It was ahead of its time in lots of ways. It would take years for btrfs to become usable and for apfs to appear on the scene. If not for the weird licensing decision, zfs would almost certainly have landed in the linux and macos kernels. We almost had an ubiquitous, standard, cross platform filesystem.

For more history about Sun and Oracle, this talk by Bryan Cantrill is a great watch: https://www.youtube.com/watch?v=-zRN7XLCRhc

link

hollerith 2827 days ago

Might it not be the case that combating bit rot is best done at higher layers of the stack similar to how it is best done at levels higher than the IP layer in a networking stack?

For example, data painstakingly entered by the user a character at a time with a keyboard might deserve more redundancy than for example a movie downloaded by iTunes.

link

giobox 2827 days ago

I’m no expert, but my understanding is that bits can “flip” and introduce errors due to things as unpredictable as background radiation etc, even on files the system has had no interaction with, which is why it’s kind of desirable to implement this kind of integrity check at the file system level. A higher level check may be completely unaware of this kind of passive background error.

Also, if I pull the drive and move it to another machine, again it’s kind of nice if the data integrity features are tied to the drive format rather than higher level software. I don’t think it’s too unreasonable to expect the file system to make sensible guarantees that the sequence of bytes I record today will remain the same until I next interact with them.

I’m not sure how appropriate comparisons with IP error correction is either; it’s a markedly different class of problem really (you are not dealing with long term storage issues at all).

link

scienceman 2828 days ago

Not OP, but probably to make sure the contents of a file are not changed by hardware errors.

link

jandrese 2828 days ago

Yep, otherwise you aren't able to detect bit level errors unless they impact the metadata, and the metadata is a tiny fraction of your total storage.

That said, your hard drive already does block level checksumming so doing it at the FS layer is mostly redundant unless the errors are being introduced in your SATA controller or on the PCI bus.

link

aaaaaaaaaab 2828 days ago

You would still need end-to-end integrity checking, unless your Mac came with ECC memory (which it probably didn't).

link

mcpherrinm 2828 days ago

Memory errors are still a concern, however, RAM is not used for persistent storage.

If a bit flip occurs during the path to storing data, that could get persisted. That's a moment in time, though. Maybe you'll notice the document you just wrote seems corrupted, or just has a typo.

But if you write successfully to disk, you are trusting that data to stay there long-term. If years later your drive corrupts a bit, you may have a very hard time noticing. Bad RAM manifests as computer instability and you can just replace RAM without data loss, as nobody is permanently storing data in RAM

Because the data spends so much longer on disk than in RAM, the chance of a bit flip affecting stored data.

link

aaaaaaaaaab 2827 days ago

It takes bad luck for sure, but I once ruined a bunch (a big bunch) of my photos by syncing them to a NAS with a faulty RAM. It was a Synology Ds212 I think, back in 2012. Mind you, the device didn’t produce symptoms other than messing up regularly spaced bytes in the transferred files.

link

sneak 2827 days ago

I am super paranoid about this kind of stuff and don’t consider a copy finished until it is first done copying then also passes an independent rsync -c.

link

cerberusss 2826 days ago

For my family photo's, I create par2 files. The rest, I don't care so much.

link

aasasd 2827 days ago

I've recently listened through an old-but-good episode of the Hypercritical podcast with John Siracusa's informative rant about this very topic: http://5by5.tv/hypercritical/56

link

tonyedgecombe 2827 days ago

I've just been going through Siracusa's old OS/X reviews, he does talk about this a lot right back to the earliest days.

link