Hacker News new | ask | show | jobs
by eafer 1775 days ago
> compatible with macOS filesystems (HFS+ and APFS)

How far along is this? I think she's underestimating how hard it is to implement a modern filesystem that won't eat users' data. I've been working on a Linux APFS driver[0] for several years, and it's not fully functional yet. It's a pity that she is working with FreeBSD, or it could have been of use to her.

[0] https://github.com/eafer/linux-apfs-rw

5 comments

> It's a pity that she is working with FreeBSD, or it could have been of use to her.

I suppose FreeBSD made more sense as a base considering MacOS is derived from BSD.

FreeBSD differs quite substantially from the XNU kernel used by MacOS because XNU is based on Mach, and it was forked (Edit: from 4.3BSD) in 1988 - before Linux even existed.

The XNU kernel does not have a stable syscall ABI so perhaps it doesn't matter if the syscalls are different because the implementation of libSystem can convert as appropriate in userspace (see also: WINE).

> MacOS because XNU is based on Mach, and it was forked (Edit: from 4.3BSD) in 1988

As another commenter noted: XNU = Mach + FreeBSD.

What you are referring to is what NeXTSTEP was, Mach + BSD4:

* https://en.wikipedia.org/wiki/NeXTSTEP#Unix

By the time Apple got to them it was a few years later, and so they decided to updated that part of the kernel, and also brought in FreeBSD's userland.

>XNU is based on Mach

Partially, XNU is Mach AND the FreeBSD-Kernel:

https://www.youtube.com/watch?v=-7GMHB3Plc8

If you look at the source of a lot of drivers on MacOS a few years ago, they were heavily based on the FreeBSD drivers.
Yes sure they do, just look the presentation.
Could you perhaps relicense your implementation so it could be used outside of Linux?
I don't think I can, I'm using gpl code from other parts of the kernel. I'm not sure I would want to either, I put a lot of work into this and the gpl gives me more of a feeling of ownership.

That said, there's nothing stopping you or anyone else from reworking my code into a (gpl-licensed) FUSE driver. I don't think it's a straightforward task, but it can definitely be done.

I don't think it's a licencing issue. It's implementation as both kernels uses different syscalls and have different architecture.
Syscalls are mostly the same, but indeed, the interface between the kernel and the file systems is very different. However, code which implements that interface on the file system is a relatively small part of the whole thing; most of the code should be reusable.

Historical note: FreeBSD used to support XFS; I believe it was ported from Linux.

It takes a lot of work to get the Linux GPU drivers to build for other operating systems.
True. Although it’s way easier than it used to be, thanks to linuxkpi layer - the piece of FreeBSD kernel which implements various Linux kernel APIs.
A lot of us have long been tired of the "all things must be Linux" mantra. It is nice having options, having other environments.
100% correct, and I think FreeBSD is woefully underrated.
> how hard it is to implement a modern filesystem that won't eat users' data.

Why is that so hard? Because of edge-cases? Caching/Timing considerations?

The main problem is simply that people really really really don't like losing their data after they saved it to disk. A simple app that corrupts its in-memory state once a year is probably acceptable. A filesystem that corrupts its on-disk state once a year is pure garbage. You basically need to aim for zero bugs.

How hard this is, it depends on the filesystem. Something like FAT, for example, is pretty much designed for ease of implementation, with few edge cases. Modern filesystems are not like that at all, the data structures are very complicated, so they must be extremely well tested before they are good enough to use. That would probably require an fsck to check for subtle inconsistencies; in the case of APFS you can use mine, but it's still very incomplete. Apple's published fsck is not very thorough.

As an example of the kind of problems to expect, I recall a bug in the Linux HFS+ driver. If you had a drive with lots of short filenames and lots of long filenames, and you started deleting the short filenames, eventually you would lose half of your files. This kind of things happen because HFS+ has variable-length keys in the index nodes of its trees, so deleting a record may trigger a complicated cascade of node splits. APFS inherited this feature, and it was very annoying to implement.

But HFS+ is very well documented; APFS is not, and that doesn't help.

It's worse than this. You not just need to aim for zero bugs, but zero bugs despite working with hardware that can degrade with use and who's firmware often does have bugs.
And yet this didn't stop Apple from automatically converting HFS+ volumes to APFS in iOS 10.3 and macOS 10.13.0 soon after the APFS beta dropped in macOS 10.12.5 and it didn't stop Apple from requiring APFS for all volumes in macOS 10.14+. Apple must have been pretty confident that APFS was working reliably to be so bold.
Not sure why you are telling me this, I don't know anything about Apple's internal development process. I assume they did run a lot of tests. But I recall at least one serious bug early on too[0].

[0] https://www.theregister.com/2018/02/16/apple_file_system_bug...

HFS+ is open source, so you don’t even need to rewrite it from scratch.
The big problems continue to be

• C being a shitty language that does not force or even encourage programmers to handle errors

• implementation knowledge about file system technology is generally stuck in the 1990s

• disk controller hardware lying to the OS to make them appear more performant than they really are

visit https://danluu.com & ctrl+f "files"

> C being a shitty language that does not force or even encourage programmers to handle errors

I don't see where this is coming from. Most of the world's top filesystems are written in C, and they work just fine. Maybe other languages could get better results, but it's hard to say with so little data.

> implementation knowledge about file system technology is generally stuck in the 1990s

If you are talking about me, that might be true, I'm relatively new to this and still learning. But there's definitely people out there with some serious "implementation knowledge". And tools like xfstests did not exist in the 1990s, that makes a huge difference.

I'm curious why you think file systems written in C would somehow be better than any other type of program written in C. We have plenty of data suggesting C programs have more bugs and vulnerabilities than programs written in safer languages.
If you are thinking about rust, we don't have any data about it in the context of the kernel yet, not even for device drivers. It may prevent some exploitable bugs, but those aren't a big concern for filesystems - otherwise they wouldn't be put inside the kernel at all. The reality is, we don't know if it would help; and given how conservative we all are with our filesystem implementations (for good reasons), it's possible that we never will.
I'm not thinking about anything specific, I'm simply disputing your general claim that moving away from C would not help, when it has clearly helped in every single other domain of software development. I see no real reason why file systems should be any different, but clearly you do, so I was asking you why you think file systems would be different.

And for specific examples of options with better safety records, then sure Rust would be one possibility, as would Ada, or Frama-C if you need to stick with C.

Can you elaborate on the second point?
courses offered by polytechnics and online learning platforms not being up-to-date, restricted and uneven access to expert implementers
I think are underestimating in general how much work it is. ReactOS has been developed for 23 years and is not in a useable state yet.
However ReactOS has a more challenging/bigger goal for a few reasons. It aims to be binary compatible, not just source compatible. Much more of the underlying tech is proprietary while the BSD subsystem on macOS is free for them to use.
"eventual compatibility with x86-64 macOS binaries (Mach-O) and libraries"
> It's a pity you are working with Linux, or it could have been of use to her.

FTFY

Oh, I wasn't trying to diss FreeBSD. It would probably have been better to write my driver for fuse to make it portable, but it's too far along at this point.
Out of interest, why can't file system implementations be portable? A shared core that exists as a library that can be tested in userland, and which can then be used by a lightweight shim that implements the kernel-facing interfaces. But I haven't seen any file systems implemented this way.
The irony is that most of the newer ones in Linux are written that way, just nobody writes a FUSE binding to them. For example, both Btrfs and XFS contain fully functional pure-userspace implementations of the filesystem inside their userspace code (libbtrfs and libxfs both contain the complete filesystem code), but nobody has written a FUSE binding to either.

Btrfs' implementation is even used to do most of the functionality in the tools (like send/receive) rather than using the kernel to do it (in order to minimize context switches that reduce I/O performance).

FUSE is neat for one-off access to obscure file systems, and FUSE is usable for things like sshfs for those who use that, but it is my impression from having used FUSE myself that FUSE does come with a noticeable performance penalty. Is my perception wrong on this? Because if it is not then I’d be hesitant to use a system where the main storage was relying on FUSE.
FUSE is also great for slower media. Copying files over USB sticks without being restrained by FAT-32 limitations is great. There have been times that I've used NTFS-3g for this because its FUSE implementation meant I could read it on any OS.
Stuff like gocryptfs seem to push FUSE to quite high performance. There is still some latency but streaming read/write seem to go fast.
This is also how the original BSD UFS/FFS was developed some three decades ago.
While the kernel interfaces aren't exactly wildly disparate, they are perhaps surprisingly diverse (and coupled to kernel implementation details) for something as boring as a filesystem. At least some of this is for performance reasons. Accessing offsets into kernel structures is usually going to be faster than copying the data to use on your own.

On top of that, since there is no single standard interface in the kernel, kernel maintainers are skeptical of adding shims to the mainline kernel. The Linux kernel developers are perhaps the most famous for being vocal about this, but they are not unique.

That being said, there are portable file system implementations that use FUSE. See e.g. NTFS-3g[1]

1: https://en.wikipedia.org/wiki/NTFS-3G

Shims in the kernel seem a bit anti-social, since all module developers are expected to contribute to the shared core code. Of course it can be done, and it has been done. But I think FUSE is better if all you care about is portability, and you don't mind the heavy performance cost. My module could be ported for FUSE of course, but it's a lot more work than it may seem, the kernel interfaces are too alien compared to a userland library.
ZFS is pretty portable because of spl ;)
Too late, the Linux fanboys have already struck!