Hacker News new | ask | show | jobs
by kev009 3694 days ago
You really shouldn't run non-CoW file systems above 90%, to include UFS and ext
1 comments

Agreed. I don't think anyone is arguing that you shouldn't do it.

What I believe, and what I think others have also concluded, is that it shouldn't be fatal. That is, when the dust has settled and you trim down usage and have a decent maintenance outage, you should be able to defrag the filesystem and get back to normal.

That's not possible with ZFS because there is no defrag utility ... and I have had it explained to me in other HN threads (although not convincingly) that it might not be possible to build a proper defrag utility.

My understanding is that the way to defrag ZFS is to do a send and receive. Combined with incremental snapshotting, this should actually be realistic with almost no downtime for most environments.

Doing so requires that you have enough zfs filesystems in your pool (or enough independent pools) that you have the free space to temporarily have two copies of the filesystem.

"Doing so requires that you have enough zfs filesystems in your pool (or enough independent pools) that you have the free space to temporarily have two copies of the filesystem."

Yes, and that is why I did not mention recreating the pool as a solution. If your pool is big enough or expensive enough, that's still "fatal".

You ought to define what is fatal here. The worst that I have seen reported at 90% full is a factor of 2 on sequential reads off mechanical disks, which is acceptable to most people. Around that point, sequential writes should also suffer similarly from writes going to the inner most tracks.
(1) I'm not proposing recreating the pool - I'm proposing an approach to incrementally fixing the pool in an entirely online manner.

(2) If your pool is big enough/expensive enough, surely you've also budgeted for backups.

(1) Regardless of what you call it, it means having enough zpool somewhere else to zfs send the entire (90% full) affected zpool off to ... that might be impossible or prohibitively expensive depending on the size of the zpool.

(2) This has nothing to do with backups or data security in any way - it's about data availability (given a specific performance requirement).

You're not going to restore your backups to an unusable pool - you're going to build or buy a new pool and that's not something people expect to have to do just because they hit 90% and churned on it for a while.

You can send/receive to the same zpool and still defrag. With careful thought, this can be done incrementally and with very minimal availability implications.

I agree it's not ideal to have filesystems do this, but it also simplifies a lot of engineering. And I think direct user exposure to a filesystem with a POSIX-like interface is a paradigm mostly on the way out anyway, meaning it's increasingly feasible to design systems to not exceed a safe utilization threshold.

This does work.
On UNIX, there are two defragmentation utilities:

`tar` and `zfs send | zfs recv`.