| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by eafer 1823 days ago

> compatible with macOS filesystems (HFS+ and APFS)

How far along is this? I think she's underestimating how hard it is to implement a modern filesystem that won't eat users' data. I've been working on a Linux APFS driver[0] for several years, and it's not fully functional yet. It's a pity that she is working with FreeBSD, or it could have been of use to her.

[0] https://github.com/eafer/linux-apfs-rw

5 comments

gizdan 1823 days ago

> It's a pity that she is working with FreeBSD, or it could have been of use to her.

I suppose FreeBSD made more sense as a base considering MacOS is derived from BSD.

the_why_of_y 1823 days ago

FreeBSD differs quite substantially from the XNU kernel used by MacOS because XNU is based on Mach, and it was forked (Edit: from 4.3BSD) in 1988 - before Linux even existed.

The XNU kernel does not have a stable syscall ABI so perhaps it doesn't matter if the syscalls are different because the implementation of libSystem can convert as appropriate in userspace (see also: WINE).

throw0101a 1823 days ago

> MacOS because XNU is based on Mach, and it was forked (Edit: from 4.3BSD) in 1988

As another commenter noted: XNU = Mach + FreeBSD.

What you are referring to is what NeXTSTEP was, Mach + BSD4:

* https://en.wikipedia.org/wiki/NeXTSTEP#Unix

By the time Apple got to them it was a few years later, and so they decided to updated that part of the kernel, and also brought in FreeBSD's userland.

nix23 1823 days ago

>XNU is based on Mach

Partially, XNU is Mach AND the FreeBSD-Kernel:

https://www.youtube.com/watch?v=-7GMHB3Plc8

rjzzleep 1823 days ago

If you look at the source of a lot of drivers on MacOS a few years ago, they were heavily based on the FreeBSD drivers.

nix23 1823 days ago

Yes sure they do, just look the presentation.

trasz 1823 days ago

Could you perhaps relicense your implementation so it could be used outside of Linux?

eafer 1822 days ago

I don't think I can, I'm using gpl code from other parts of the kernel. I'm not sure I would want to either, I put a lot of work into this and the gpl gives me more of a feeling of ownership.

That said, there's nothing stopping you or anyone else from reworking my code into a (gpl-licensed) FUSE driver. I don't think it's a straightforward task, but it can definitely be done.

NotEvil 1823 days ago

I don't think it's a licencing issue. It's implementation as both kernels uses different syscalls and have different architecture.

trasz 1823 days ago

Syscalls are mostly the same, but indeed, the interface between the kernel and the file systems is very different. However, code which implements that interface on the file system is a relatively small part of the whole thing; most of the code should be reusable.

Historical note: FreeBSD used to support XFS; I believe it was ported from Linux.

rjsw 1823 days ago

It takes a lot of work to get the Linux GPU drivers to build for other operating systems.

trasz 1823 days ago

True. Although it’s way easier than it used to be, thanks to linuxkpi layer - the piece of FreeBSD kernel which implements various Linux kernel APIs.

greyhair 1822 days ago

A lot of us have long been tired of the "all things must be Linux" mantra. It is nice having options, having other environments.

naikrovek 1821 days ago

100% correct, and I think FreeBSD is woefully underrated.

fn1 1823 days ago

> how hard it is to implement a modern filesystem that won't eat users' data.

Why is that so hard? Because of edge-cases? Caching/Timing considerations?

eafer 1823 days ago

The main problem is simply that people really really really don't like losing their data after they saved it to disk. A simple app that corrupts its in-memory state once a year is probably acceptable. A filesystem that corrupts its on-disk state once a year is pure garbage. You basically need to aim for zero bugs.

How hard this is, it depends on the filesystem. Something like FAT, for example, is pretty much designed for ease of implementation, with few edge cases. Modern filesystems are not like that at all, the data structures are very complicated, so they must be extremely well tested before they are good enough to use. That would probably require an fsck to check for subtle inconsistencies; in the case of APFS you can use mine, but it's still very incomplete. Apple's published fsck is not very thorough.

As an example of the kind of problems to expect, I recall a bug in the Linux HFS+ driver. If you had a drive with lots of short filenames and lots of long filenames, and you started deleting the short filenames, eventually you would lose half of your files. This kind of things happen because HFS+ has variable-length keys in the index nodes of its trees, so deleting a record may trigger a complicated cascade of node splits. APFS inherited this feature, and it was very annoying to implement.

But HFS+ is very well documented; APFS is not, and that doesn't help.

hnlmorg 1822 days ago

It's worse than this. You not just need to aim for zero bugs, but zero bugs despite working with hardware that can degrade with use and who's firmware often does have bugs.

jscipione 1821 days ago

And yet this didn't stop Apple from automatically converting HFS+ volumes to APFS in iOS 10.3 and macOS 10.13.0 soon after the APFS beta dropped in macOS 10.12.5 and it didn't stop Apple from requiring APFS for all volumes in macOS 10.14+. Apple must have been pretty confident that APFS was working reliably to be so bold.

eafer 1821 days ago

Not sure why you are telling me this, I don't know anything about Apple's internal development process. I assume they did run a lot of tests. But I recall at least one serious bug early on too[0].

[0] https://www.theregister.com/2018/02/16/apple_file_system_bug...

trasz 1822 days ago

HFS+ is open source, so you don’t even need to rewrite it from scratch.

bmn__ 1823 days ago

The big problems continue to be

• C being a shitty language that does not force or even encourage programmers to handle errors

• implementation knowledge about file system technology is generally stuck in the 1990s

• disk controller hardware lying to the OS to make them appear more performant than they really are

visit https://danluu.com & ctrl+f "files"

eafer 1822 days ago

> C being a shitty language that does not force or even encourage programmers to handle errors

I don't see where this is coming from. Most of the world's top filesystems are written in C, and they work just fine. Maybe other languages could get better results, but it's hard to say with so little data.

> implementation knowledge about file system technology is generally stuck in the 1990s

If you are talking about me, that might be true, I'm relatively new to this and still learning. But there's definitely people out there with some serious "implementation knowledge". And tools like xfstests did not exist in the 1990s, that makes a huge difference.

naasking 1821 days ago

I'm curious why you think file systems written in C would somehow be better than any other type of program written in C. We have plenty of data suggesting C programs have more bugs and vulnerabilities than programs written in safer languages.

eafer 1821 days ago

If you are thinking about rust, we don't have any data about it in the context of the kernel yet, not even for device drivers. It may prevent some exploitable bugs, but those aren't a big concern for filesystems - otherwise they wouldn't be put inside the kernel at all. The reality is, we don't know if it would help; and given how conservative we all are with our filesystem implementations (for good reasons), it's possible that we never will.

naasking 1821 days ago

I'm not thinking about anything specific, I'm simply disputing your general claim that moving away from C would not help, when it has clearly helped in every single other domain of software development. I see no real reason why file systems should be any different, but clearly you do, so I was asking you why you think file systems would be different.

And for specific examples of options with better safety records, then sure Rust would be one possibility, as would Ada, or Frama-C if you need to stick with C.

queuebert 1822 days ago

Can you elaborate on the second point?

bmn__ 1822 days ago

courses offered by polytechnics and online learning platforms not being up-to-date, restricted and uneven access to expert implementers

ma2rten 1823 days ago

I think are underestimating in general how much work it is. ReactOS has been developed for 23 years and is not in a useable state yet.

cmiller1 1823 days ago

However ReactOS has a more challenging/bigger goal for a few reasons. It aims to be binary compatible, not just source compatible. Much more of the underlying tech is proprietary while the BSD subsystem on macOS is free for them to use.

ma2rten 1823 days ago

"eventual compatibility with x86-64 macOS binaries (Mach-O) and libraries"

zephyr9 1823 days ago

> It's a pity you are working with Linux, or it could have been of use to her.

FTFY

eafer 1823 days ago

Oh, I wasn't trying to diss FreeBSD. It would probably have been better to write my driver for fuse to make it portable, but it's too far along at this point.

atombender 1823 days ago

Out of interest, why can't file system implementations be portable? A shared core that exists as a library that can be tested in userland, and which can then be used by a lightweight shim that implements the kernel-facing interfaces. But I haven't seen any file systems implemented this way.

Conan_Kudo 1823 days ago

The irony is that most of the newer ones in Linux are written that way, just nobody writes a FUSE binding to them. For example, both Btrfs and XFS contain fully functional pure-userspace implementations of the filesystem inside their userspace code (libbtrfs and libxfs both contain the complete filesystem code), but nobody has written a FUSE binding to either.

Btrfs' implementation is even used to do most of the functionality in the tools (like send/receive) rather than using the kernel to do it (in order to minimize context switches that reduce I/O performance).

codetrotter 1823 days ago

FUSE is neat for one-off access to obscure file systems, and FUSE is usable for things like sshfs for those who use that, but it is my impression from having used FUSE myself that FUSE does come with a noticeable performance penalty. Is my perception wrong on this? Because if it is not then I’d be hesitant to use a system where the main storage was relying on FUSE.

aidenn0 1823 days ago

FUSE is also great for slower media. Copying files over USB sticks without being restrained by FAT-32 limitations is great. There have been times that I've used NTFS-3g for this because its FUSE implementation meant I could read it on any OS.

lapinot 1823 days ago

Stuff like gocryptfs seem to push FUSE to quite high performance. There is still some latency but streaming read/write seem to go fast.

trasz 1823 days ago

This is also how the original BSD UFS/FFS was developed some three decades ago.

aidenn0 1823 days ago

While the kernel interfaces aren't exactly wildly disparate, they are perhaps surprisingly diverse (and coupled to kernel implementation details) for something as boring as a filesystem. At least some of this is for performance reasons. Accessing offsets into kernel structures is usually going to be faster than copying the data to use on your own.

On top of that, since there is no single standard interface in the kernel, kernel maintainers are skeptical of adding shims to the mainline kernel. The Linux kernel developers are perhaps the most famous for being vocal about this, but they are not unique.

That being said, there are portable file system implementations that use FUSE. See e.g. NTFS-3g[1]

1: https://en.wikipedia.org/wiki/NTFS-3G

eafer 1822 days ago

Shims in the kernel seem a bit anti-social, since all module developers are expected to contribute to the shared core code. Of course it can be done, and it has been done. But I think FUSE is better if all you care about is portability, and you don't mind the heavy performance cost. My module could be ported for FUSE of course, but it's a lot more work than it may seem, the kernel interfaces are too alien compared to a userland library.

nix23 1823 days ago

ZFS is pretty portable because of spl ;)

zephyr9 1820 days ago

Too late, the Linux fanboys have already struck!