Hacker News new | ask | show | jobs
by giobox 2831 days ago
Simple, to help prevent “bit rot”. The problem is exacerbated further in that many of us treat cloud sync services as backup, which they arguably aren’t - they can inconveniently just spread the decay.

I’d also hoped that a next generation file system from Apple would have had more to say on this topic, but it seems like features that promote their iOS device agenda took front seat over less “sexy” features like data integrity.

In the days before iOS devices dominated OS level decision making at Apple there was an assumption that Apple might adopt ZFS as their next generation file system, which is apparently much better in this regard. There’s various evidence of a cancelled MacOS ZFS project scattered throughout past MacOS releases.

> https://en.wikipedia.org/wiki/Data_degradation

> https://arstechnica.com/gadgets/2016/06/zfs-the-other-new-ap...

2 comments

Word on the street is that Apple's ZFS integration was mostly finished and it was going to be announced at WWDC. Sun opensourced ZFS under the CDDL. But then Oracle bought Sun, and Apple's lawyers wanted to make sure Oracle wouldn't try to sue them over ZFS somehow anyway. Negotiations between Apple and Oracle for a clear ZFS license fell through. Without legal go-ahead the feature was pulled from macos at the 11th hour and buried.

When ZFS was opensourced under the CDDL, lots of people complained that they should have chosen a clearer, more permissive opensource license. Other people said it was fine, because the license was good enough and Sun is full of good people. The way everything played out, its clear the first group's concerns were valid.

Its a huge shame. ZFS is a fantastic piece of engineering. It was ahead of its time in lots of ways. It would take years for btrfs to become usable and for apfs to appear on the scene. If not for the weird licensing decision, zfs would almost certainly have landed in the linux and macos kernels. We almost had an ubiquitous, standard, cross platform filesystem.

For more history about Sun and Oracle, this talk by Bryan Cantrill is a great watch: https://www.youtube.com/watch?v=-zRN7XLCRhc

Might it not be the case that combating bit rot is best done at higher layers of the stack similar to how it is best done at levels higher than the IP layer in a networking stack?

For example, data painstakingly entered by the user a character at a time with a keyboard might deserve more redundancy than for example a movie downloaded by iTunes.

I’m no expert, but my understanding is that bits can “flip” and introduce errors due to things as unpredictable as background radiation etc, even on files the system has had no interaction with, which is why it’s kind of desirable to implement this kind of integrity check at the file system level. A higher level check may be completely unaware of this kind of passive background error.

Also, if I pull the drive and move it to another machine, again it’s kind of nice if the data integrity features are tied to the drive format rather than higher level software. I don’t think it’s too unreasonable to expect the file system to make sensible guarantees that the sequence of bytes I record today will remain the same until I next interact with them.

I’m not sure how appropriate comparisons with IP error correction is either; it’s a markedly different class of problem really (you are not dealing with long term storage issues at all).