Hacker News new | ask | show | jobs
by quotemstr 1830 days ago
Why would you want to bypass the filesystem by talking to the block device directly? Doesn't O_DIRECT on a preallocated regular file accomplish the same thing with less management complexity and special OS permissions? Granted, the file extents might be fragmented a bit, but that can be fixed.
2 comments

A "regular file" might reside in multiple locations on disk for redundancy, or might have a checksum that needs to be maintained alongside it for integrity. Or, as you say, its contents might not reside in contiguous sectors - or you might be writing to a hole in a sparse file. There's a lot of "magic" that could go on behind the scenes when operating on "regular files", depending on what filesystem you're using with what options. Directly operating on the block device makes it easier to reason about the performance guarantees, since your reads and writes map more cleanly to the underlying SCSI/ATA/NVME commands issued.
If you understand your workload and the hardware well enough to understand how doing direct I/O on a file will help - then you’re going to generally do better against a direct block device because there are fewer intermediate layers doing the wrong optimizations or otherwise messing you up. From a pure performance perspective anyway. Extents are one part of the issue, flushes to disk (and how/when they happen), caching, etc.

Doesn’t mean it isn’t easier to deal with as a file from an administration perspective (and you can do snapshots, or whatever!), but Lvm can do that too for a block device, and many other things.

With O_DIRECT though you're opting out of the filesystem's caching (well, VFS's), forced flushes, and most FS level optimizations, so I'd expect it to perform on par with direct partition access.

Do you have numbers showing an advantage of going directly to the block device? Personally, I'd consider the management advantages of a filesystem compelling absent specific performance numbers showing the benefit of direct partition access.

You do when it does that/respects it which isn’t always. The point is that you have more layers. If you’re trying to be as direct as possible, more layers is unhelpful.

Since you get most of the same advantages management wise with lvm while using the block interface (including snapshots, resizing, and all the other management goodies), you’re not exactly getting much extra functionality either.

Your concerns are all theoretical and the management disadvantages of direct partition access are real with or without LVM (which itself is exactly the sort of middle layer you claim to be worried about.)

Do you have numbers or not?

Ah, but now you’re moving the goalposts it seems?

Since most of what we’re talking about is unnecessary complexity for no real gain, what concrete metric do you think would be useful exactly? I just pointed out that you can get the same management advantages without it (say for a dev environment or rollbacks or whatever). And you get a simpler, cleaner story without extra layers if you don’t want to use lvm (such as in production), which you can’t get from O_DIRECT.

I also have this thread from Linus calling O_DIRECT brain damaged and to never use it. [https://lkml.org/lkml/2007/1/10/233]

The problem is some of the alternatives seem to be suggested by way of "if we had any support for this it would be better than O_DIRECT". So don't use O_DIRECT, use the alternative which doesn't exist, is still too slow, only covers parts of what you need, etc. .
I respect Linus, but he has a problem where he never ever backtracks and admits he was wrong about something. Take C++ for example.