Hacker News new | ask | show | jobs
by jmillikin 1402 days ago
I wonder if this could replace most uses of NBD (network block devices), and/or help get iSCSI into userspace where more flexible load-balancing policy can be implemented.

It also reminds me of attempts to define BUSE[0][1][2], which would have been a block device equivalent of FUSE. IIRC attempts to get BUSE into the Linux kernel have been blocked for performance reasons -- the FUSE protocol isn't well designed and is only barely acceptable for VFS.

If io_uring (+ careful use of zero-copy) has fixed the performance issues with userspace block devices, maybe it would be applicable to FUSE (or FUSE-v2)? I've tried using io_uring with the current FUSE protocol to reduce syscall overhead and it kinda works, but a protocol designed to operate in that mode from the beginning would be even better.

[0] https://github.com/acozzette/BUSE

[1] https://dspace.cuni.cz/bitstream/handle/20.500.11956/148791/...

[2] https://dl.acm.org/doi/10.1145/3456727.3463768

4 comments

The SPDK project is certainly looking to use this to replace our limited use of NBD, as well as present SPDK block devices as kernel block devices, including devices backed by userspace implementations of iSCSI, NVMe-oF, and various other network protocols.
I had the same thought re: FUSE. I'm tentatively thinking of getting back into programming by working some on sshfs (because I'm bored and think it's important, while it's maintainer-less and very squarely within my specialty). Not until early September, though, since right now life is consumed by end-of-summer stuff and then getting my daughter off to college. Anyhow, within that context I've also thought about FUSE (which I also have some experience with since I added SELinux tag support) plus io_uring. Certainly nothing's likely to happen right away, but it will be on my personal roadmap.
I'd be careful about giving up the network capabilities. With NBD it's very useful to move the client and server apart, either having them both run in userspace on the same machine over a Unix domain socket or talking remotely over the network. For our case this is by far the most common use of NBD, we hardly use nbd.ko at all.
Is BUSE significantly different from CUSE (“character device”)?

https://lwn.net/Articles/308445/

Yep! Character devices are much closer to "stream of bytes", and from the FUSE perspective they look like a single file with limited operations (open, close, read, write). Think of something like a mouse (sending a stream of motion/click events) or a webcam (send stream of frames, receive basic control commands). If you've written even the most basic FUSE layer, you've got all the necessary handlers to implement CUSE too.

Block devices operate on blocks of data identified by offset. Hard disks, CD-ROM drives, USB sticks, basically anything where it'd make sense to say "read (or write) these 1024 bytes at offset 0x10000".

You can in principle implement a block device-ish API in FUSE by disabling open/close and requiring all reads/writes to be at given offsets -- IIRC this is how the "fuseblk" mode added for ntfs-3g works -- but the protocol is too chatty to be fast enough for things people want block devices for.

I've also heard the kernel's block layer error handling doesn't interact well with the FUSE protocol, but I don't know the details too well on that.

> Block devices operate on blocks of data identified by offset.

Sure, although CUSE read and write operations take offsets, too. The kernel could just send block-sized IOs to a CUSE driver and it wouldn't be all that different.

> You can in principle implement a block device-ish API in FUSE by disabling open/close and requiring all reads/writes to be at given offsets

Right, ok.

I think the historical distinction between block and character devices is largely that -- historical. Nowadays the distinction is mostly whether or not the kernel puts a block cache in front of the device. FreeBSD eliminated the distinction entirely.

There may be kernels that have simplified their device model to unify character and block devices, but Linux has not. FUSE/CUSE (and now ublk) are Linux-oriented protocols from the beginning, with relatively little thought given to cross-platform compatibility.

If you use FreeBSD then you're likely familiar with the challenges they've faced adapting FUSE to their VFS, and last time I checked they don't have plans to support CUSE at all.

You might also be interested in <https://lwn.net/Articles/343514/> (from 2009!), which discusses some of the challenges with using something like the FUSE protocol to back a block device in Linux. That message also describes a better solution which, to my eyes, looks a lot like ublk.

FreeBSD does have CUSE, for what it’s worth.
FreeBSD has a /dev/cuse device, and a libcuse reimplemented on top of it, but it uses a different protocol from Linux's CUSE. You can see the FreeBSD implementation at https://github.com/freebsd/freebsd-src/blob/release/13.1.0/s... -- note how cuse_server_read() and cuse_server_write() are stubs.

I am somewhat familiar with this because I wrote a FUSE/CUSE server library in Rust, and tried porting it to FreeBSD. The FUSE bits worked with only minor issues[0][1], but the CUSE bits were completely different so I had to turn off that part of the library for FreeBSD targets.

[0] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=253411

[1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=253500