| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by BuildTheRobots 49 days ago
	Out of curiosity, why were you using a >50gb file on a dataset as as iSCSI target rather than a zvol or did I misunderstand?

1 comments

oasisaimlessly 49 days ago

Why use zvols? Aren't they essentially just single-file ZFS datasets (allowing e.g. independent snapshotting)?

link

ssl-3 49 days ago

They are as you describe.

Except zvols present as real-live block devices that can do block-device things instead of regular-file things, and that's important for some stuff.

But AFAICT, iSCSI targets on Linux are not one of those things. They don't care; they work the ~same whether backed by files or block devices.

And on the performance benchmarks I find that compare performance of zvols-vs-files on ZFS, files usually win.

> Why use zvols?

Probably for the same reasons that people recommended separate disk partitions for /var, /usr, and such as was the case ~30 years ago when I got started with desktop *nix systems.

That reason seemed to boil down to: "If it was good for a Sun/3 in 1986, then it must also be good for a Linux box in 1996." It was a dumb reason.

tl;dr, folklore. :)

link

wahern 49 days ago

> That reason seemed to boil down to: "If it was good for a Sun/3 in 1986, then it must also be good for a Linux box in 1996." It was a dumb reason.

ext2 disk corruption, especially on power failure or a crash, was a common threat in the 1990s. Not merely to the point of requiring fsck and a bunch of orphaned files (which was inevitable on an unclean shutdown), but just totally fubar'd, requiring a reformat. The only thing worse was then trying to reinstall Slackware from the floppy disks, at least one of which had a better than even chance of corruption from just sitting in the drawer since the last reinstall, requiring another long night nursing a download over the 2400 baud modem.

I use OpenBSD, and while FFS2 has been far more robust than 1990s Linux ext2, smart partitioning is still warranted, not just for minimizing blast radius, but also for managing backups, etc. I haven't had the chance to use ZFS, and it might be the only filesystem I might consider skipping partitioning for on a workhorse system, but even if you trust the design and code quality of ZFS, it's running unprotected alongside a bunch of horribly buggy kernel subsystems and drivers, so....

link

ssl-3 49 days ago

You raise an interesting point. Please allow me to enhance it.

It could get worse than reinstalling Slackware, again, from floppies. I didn't get to experience corrupted floppies; I instead had a habit of recycling my Slackware disksets for other purposes after the system was up and running. So any complete re-install started by booting up MS-DOS to run Telemate to start downloading them fresh from Sunsite...again.

But at least it was Telemate, so I could manage files to free up more floppy disks while this process slowly continued at [I guess I was fortunate] 9600 or 14.4kbps. ;)

I don't recall much difficulty with ext2 being fragile (though I can provide horror stories about OS/2's HPFS). If I had issues with it, they didn't leave any scars.

But I accept your correction. It may have been the case that splitting the filesystem into different partitions made sense because ext2 was fickle, and I was just very lucky in deliberately ignoring that advice after the first time I misjudged the partition sizes at install and ran out of space in some directory or other.

Hard drives seemed so small back then. Installing a real OS meant a serious tradeoff in the ratio between user data and system data.

---

Anyway, ZFS. The ZFS way is that it owns the whole disk -- for a long time, the preferred method didn't even use partitions at all. Nowadays OpenZFS does create one partition for itself by default, but it uses the whole disk just the same.

Blast radius is limited by having different datasets (think "filesystem-light"), and read-only snapshots, and easy, consistent backups (if you have a compatible device or service to send them to -- otherwise, it's ~the same backup dance as any other filesystem with snapshots).

It's a different way of doing things, like a subsystem in and of itself. It keeps its own caches and generally wants to be as close to the metal as it can be. Which sounds scary, but meh: Almost everything worth doing gets done with two commands, zfs and zpool, and the syntax has been consistent enough over the years that old documentation from Sun still has value.

I've been using it for most of a decade now and I find it to be ridiculously good. My only wish is that it could be a first-rate player on Linux, but license incompatibilities be that way sometimes.

link

rincebrain 49 days ago

The reason to use zvols is twofold, AFAIR:

- serving a bunch of storage as a blob is a common use case for e.g. iSCSI exporting, and so, if you want to be able to zfs snapshot/send/rollback/etc on the level of "one logical disk", it makes sense to have an optimized route to expose that rather than making you expose a filesystem that only has one file on it to do the same dance

- avoiding unnecessary overhead/complexity from the FS layer being involved when all you really care about is exposing a single block device of storage

Of course, in the era where you're sad that inline compression/checksum/etc are bottlenecking your 48 NVMe pool, that probably isn't where you'd reach for optimizing first...or second...

But just exposing the block storage is sufficiently useful that at least one of the original projects to port ZFS on Linux wasn't planning to implement the FS layer, they just wanted block storage for Lustre.

link

ssl-3 49 days ago

I felt the same way about it as you before I started looking for benchmarks as I wrote my previous comment. :)

After all: Why would zvols exist at all if they weren't superior in important ways?

> it makes sense to have an optimized route to expose that rather than making you expose a filesystem that only has one file on it to do the same dance

It's important to note that additional datasets are essentially free on ZFS; it's no big deal to have lots of them (millions of millions of them is A-OK), and datasets don't have a pre-determined size like zvols do.

Although zvols can also be grown and shrunk, just as files [within datasets] can be.

Both datasets and zvols make the same kind of mess out of zfs list's unfiltered output.

But zvols introduce a new concept, while anyone who uses ZFS is already familiar with datasets that contain files.

I think this part is a wash, and that it comes down to operator preference.

> avoiding unnecessary overhead/complexity from the FS layer being involved when all you really care about is exposing a single block device of storage

Maybe? Again, the benchmarks I found (hours ago now and tabs long-closed; I'll find more if anyone insists) suggested that files were faster than zvols, which suggests reduced overhead. (It's very possible that the tests I found were naively implemented, but then: It's also possible for any of us to do something naive.)

Anyway, it's interesting to think about.

It seems like the right answer is to test with one's own workload and find the best fit, instead of assume that one way is better than the other.

For its part, ZFS should handle a zvol and a file-on-a-dataset with equal stoicism and reliability.

link

rincebrain 47 days ago

Sure, I'm not suggesting that they're a good idea to use blindly at this point - I think most people are building on filesystem-based setups so most of the polish is going there.

But that was the original logic.

I also would be curious to see benchmarks for them on FBSD and Linux, because FBSD and Linux (the platforms at large) diverged in how they handle "disks", with FBSD opting for only character devices (unbuffered) and Linux only block devices (buffered).

link