Hacker News new | ask | show | jobs
by veidr 3651 days ago
What a great and valuable post, especially since this info is the result of talking to the APFS team at WWDC, and has not been published anywhere else yet.

Of particular interest (to me) was the "Checksums" section:

    Notably absent from the APFS intro talk was any mention of
    checksums....APFS checksums its own metadata but not user data.

    ...The APFS engineers I talked to cited strong ECC protection
    within Apple storage devices. Both flash SSDs and magnetic media
    HDDs use redundant data to detect and correct errors. The
    engineers contend that Apple devices basically don’t return
    bogus data. 
That is utterly disappointing. SSDs have internal checksums, sure, but there are so many different ways and different points at which a bit can be flipped.

It's hard for me to imagine a worse starting point to conceive a new filesystem than "let's assume our data storage devices are perfect, and never have any faulty components or firmware bugs".

ZFS has a lot of features, but data integrity is the feature.

I get that maybe a checksumming filesystem could conceivably be too computationally expensive for the little jewelry-computers Apple is into these days, but it's a terrible omission on something that is supposed to be the new filesystem for macOS.

16 comments

I agree; no checksumming of user data is very disappointing. If there were performance issues, they could build checksumming into the filesystem, but make it a volume-specific option. No checksumming on the watch, strong integrity guarantees on the Mac Pro.

Their filesystem goals are in some ways consistent with Apple's (marketing) vision: Users would never have terabyte libraries of anything, as the various iServices would (should) be hosting that stuff in the cloud (where one presumes it is stored on a filesystem that actually includes data integrity). Since users won't be storing much of anything locally, Apple needn't care too much about data integrity. This is of course, nonsense.

The idea that Apple's storage devices are error-free is arrogant--but even assuming that were true, there can still be bit errors in the SATA/PCI bus, errors in memory, race conditions, gamma rays, etc. Apple uses ECC memory on their Mac Pro, so obviously someone still believes that sort of thing is possible.

I don't see why Apple couldn't just recommend that their pro users who have need of this sort of data integrity locally run their own server with FreeBSD + ZFS. Apple has really backed off on their attempts to market OS X Server to this crowd. Heck, they're probably using FreeBSD already if they need that much data integrity.
Here's the thing: everybody needs this sort of data integrity.

Literally nobody wants their files to be silently corrupted. ZFS made it much easier for (nerds like us) to attain very high levels of data integrity.

APFS was (and maybe still is?) a chance to make that the default for regular people.

Do checksums actually need to be in the filesystem, though? It does seem like an important feature, but couldn't they be done at a higher level, like the way Spotlight indexing works on the Mac today?
It isn't just pro users.

With TB file systems, assuming you haven't outsourced everything to iCloud, data integrity matters. If you have, now you're trusting them not to screw up, ever.

From the movie or mp3 that mysteriously no longer plays, through to more important things - business data or family photos. I suspect many people have experienced bit rot, even if they don't recognise it as such. We've even reached a point where with quoted drive figures copying 2tb from one drive to another will likely result in a bit flip (source - Ars ZFS+btrfs article a couple of years back).

Heck, most people have some level of data loss from a HDD or flash drive fail. Sometimes even when they tried to do all the right things. Only question is whether it was backed up. In the case of personal users, unlikely. Self healing could have been quite some selling point!

I have experienced many bitrotted mp3s in my day. Thankfully I've been able to replace them online. As for other files? I can't recall any that are now unable to open for mysterious reasons.

I also happen to run a home file server on FreeBSD + ZFS, though I don't think that machine has ECC memory so it is still technically vulnerable to corruption.

I hear they use RHEL nowadays.
Does it not matter anyway though? If the file lives locally for a while, and it rots there, the corrupt version will be synced back into the cloud and the corruption will spread. I admit the window of corruption will be smaller, but it will still be there, no?
Talking to the Apple engineers it really didn't seem to be an issue of computation. They seemed genuine in their belief that they could solve data integrity with device qualification. While I asked them 100 questions they asked me 2: had I ever actually seen bit rot (yes), and what kind of drives did we ship with the ZFS Storage Appliance (mostly 7200 nearline drives).
That's dumbfounding. I know first hand a certain monthly-fee movie streaming service and the CDN I work for can tell anyone who wants to hear about handling silent corruption and bit rot and we have a relatively small fleets. At home ZFS saved me from a faulty power supply on my old workstation.

And.. the red herring here is, Apple users will want to plug in third party storage. There's just no way to contain what someone will plug in to USB and ThunderBolt, and it's insane to think APFS would not be ready to help there.

That would suggest that APFS is only relevant for internal storage procured by Apple. Do they not intend for it to be used on external storage?
They mentioned that it would be used on removable media as well.
If the crypto layer has proper MACs then presumably checksums at lower layers aren't so important. Did they give you much indication that they thought disk encryption would become standard?
I've had an Intel S3500 brick within 4 weeks and a SanDisk Extreme Pro start to show occasional I/O errors after a few months. The latter doesn't just lead to bit rot, but unreadable files. With ZFS I was able to identify those with a quick zpool scrub. Which shows how valuable checksumming is even in the absence of ECC memory. At least according to my anecdotal experience, flash is much more flakey than conventional hard disks, so the assumption that stuff just doesn't happen seems ludicrous.
A lot of the CoW patents, WAFL, snapshot patents that Network Appliance filed in the late 1990s have expired, or are expiring this year.

For example, https://www.google.com/patents/US6289356 was filed in 1998, so I presume it's expiring fairly soon. Given that some of the original lawsuits were Network Appliance suing Sun/Oracle, I'm wondering how much of a role this played in the timing of the release of these features? After all, Apple could pretty much pick a window to release a new file system - nothing special about 2016, that they couldn't have done this in 2015 or 2017...

Which makes me wonder if there are data integrity patents that will expire, and at such time, Apple can now drop the functionality into APFS. After all, they did say during their presentation, that the flexibility of the data format is one of the key design features of APFS.

Woo hoo!!!! I love your theory.

No idea if you're right, but it makes Apple's otherwise baffling stance plausible.

6289356 also lists a "priority date" on the google page of 1993. If that is an actual "priority date" rather than a google metadata "add on" then this patent expired (absent any term extensions) on June 3, 2013.
> The engineers contend that Apple devices basically don’t return bogus data.

It's much easier to pretend that this is the case when the file system isn't verifying it.

Checksumming would probably expose problems that would otherwise go unnoticed by users or be blamed on computer gremlins. It's hard to say if doing the "correct" thing here would improve the subjective user experience. Maybe putting on airs of infallibility is the more profitable route.

Good point. Going by the tone of Apple engineers' response, it sure does sound like they are going for plausible deniability.
Agreed. What an arrogant attitude especially considering their Macbook Pro (2015) recently had a corruption bug which necessitated a firmware update: https://support.apple.com/kb/DL1830?locale=en_GB

Good checksumming to detect bit rot is exactly what is needed since as an owner of said laptop I have NO idea whether any of my data was affected.

If Apple want to say 'the majority of our devices are mobile and checksumming puts a large performance overhead' then that's one thing. But to claim it's not needed is just plain wrong and makes me worry that Apple's product managers sit in an echo chamber hearing only what they want to hear.

> The engineers contend that Apple devices basically don’t return bogus data.

Holy shit.

I guess that explains why my Mac recently had a bunch of daemons burning all cpu crashing repeatedly in a tight loop when getting sqlite errors on a db in ~/Library. Cause disk corruption never happens.

Hmm, I guess like there is the "sufficiently smart compiler" falacy, there is now the "sufficently reliable hardware" falacy.

But on another level, I guess if hardware fails, then well, you buy more hardware, which is good for Apple. Presumably people who bought in the past from Apple won't turn around and buy an Acer or HP laptop. They'll still buy Apple.

If hardware fails silently, you won't buy more hardware. You'll just come across something odd and say, hmm, typical Apple-bugginess.

It would be much nicer if your computer said, “I've detected a bit flip, please restore this file from backup”

Even more fun. Data gets corrupted, and backups pick it up, and start overwriting good backups with it eventually.
If your backup system involves overwriting old backups, it's not a backup system. It's a data loss system.
Unless you have infinite storage, you'll have to overwrite some backups at some point in the future.
Storage is not that cheap yet.
Making backups more granular means you remove sets of backups (or you collapse incremental backups). If a new backup causes corruption to back-propogate then it's not a backup.
It is, if you outsource the storage. Backblaze is $5 a month per computer for (virtually) unlimited storage. They keep old copies of files for 30 days.
Even better: automatically restore the file from backup.
The failure mode of hardware is not all or nothing. A single sector in an HDD or a single page in an SSD can fail and the rest can be fine for years; both HDDs and SSDs have many spares for this expected condition.
When all you have is Apple’s Disk Utility.app, all storage media is perfect. That was irony. Truth is hard drives can have more than 30 bad blocks and still have a verified S.M.A.R.T. status in their app.

I recommend sending every file system engineer on a year-long journey as a traveling system integrator.

If the storage is 100% reliable, why do they checksum the metadata?
i think it makes a lot of sense. turning on these kinds of checks can be scary. the current situation is mostly no-one is effected by bit-rot. this is probably because when it rarely happens it flips some bits that don't really matter anyway. but as soon as you turn on checksumming in software without any automatic error correction people are going to start freaking out when their files become inaccessible or they have to jump through some hoops to access the 'corrupted' file which looks entirely fine to them anyway.

same deal with some heap protections. say you are running a kernel which doesn't have byte patterns to detect heap overflows or reuse after free. maybe you have some heap overflows which because of their nature never cause any corruption but now you turn on heap protections and peoples kernels are getting more panics :/

What is the user experience for when a checksumming filesystem detects an error?

If the fs detects a bit error does it flag the file as entirely unreadable? Move it to lost+found? Force me to restore the file from a backup? All these options seem more scary for an end user than blissful ignorance.

Don't misunderstand me, I've lost a few family photos over the years due to bit rot. So, I appreciate a fs that offers more protections. But, I honestly don't know offhand how an end user would recover from an error in /System or even an error in a family photo, or for that matter a word doc.

If the fs detects a bit error does it flag the file as entirely unreadable? Move it to lost+found? Force me to restore the file from a backup?

For files stored in iCloud Drive, if that version of the file exists in the cloud, the OS could automatically re-fetch the file. But, yeah, for lots of circumstances there's not going to be a "good" option to give the user.

EDIT: Same applies to Time Machine (or whatever Apple's backup solution will be called in the APFS era).

It was a stealthy feature addition that went totally unannounced, but as of 10.11, Time Machine stores file checksums in the backup. See 'tmutil verifyChecksums'.
Perhaps you (or Apple) would still be able to achieve the checksum feature by a smart choice of encryption algorithm?

APFS has file level encryption, so you would in theory be able to detect a flip by selecting an encryption algorithm that gives error upon decrypting modified data. I could see this being worked into apps_fsck at some point.

A similar case could be made for adding it into the compression algorithm, which the OP thinks will be coming to APFS later, popular algorithms such as deflate already have this built in.

Same thoughts here as well, but how does encryption correct data?
Correcting data is much harder, and would require a significant amount of additional storage to provide enough redundancy to be able to deduce the original data. But detecting is good enough for many uses, you would be able to restore the files from the Time Machine before those get silently corrupted as well.
Checksums are usually very fast to compute as the ARM CPUs of any modern phone have crypto engines, and their laptops do as well. I think trading data protection for performance reasons would be pretty irrational.
Except that with NVME drives, and the parallel operations you can run on these, the performance of checksums becomes important again. Recent experiments with HAMMER2: https://www.dragonflydigest.com/2016/06/15/18281.html
Yeah, the ever present march of storage --> memory has really put a strain on our current compute architectures. Thanks for the note, will be reading more about it.
> ZFS has a lot of features, but data integrity is the feature.

And in the Sun era you might be prepared to bet your business on not being on the wrong end of a lawsuit from the owner of the various patents and copyrights around Sun IP.

Only a completely insane person would argue that's a good idea now.

External hard drives / SSDs that are using APFS? I thought that would be a pretty obvious use case.
Either the engineer is young, or hasn't been doing systems programming for long enough.
Perhaps they assume that you will sync everything to the iCloud anyway (?)
How are you expected to know you need to restore from the backup if the damage is silent?

How can you have confidence in your backup if damaged data can be silently written to it?