Hacker News new | ask | show | jobs
by codetrotter 2023 days ago
This is huge! And very exciting :D

One thing I am wondering about is this:

> Redacted zfs send/receive - Redacted streams allow users to send subsets of their data to a target system. This allows users to save space by not replicating unimportant data within a given dataset or to selectively exclude sensitive information. #7958

Let’s say I have a dataset tank/music-video-project-2020-12 or something and it is like 40 GB and I want to send a snapshot of it to a remote machine on an unreliable connection. Can I use the redacted send/recv functionality to send the dataset in chunks at a time and then at the end have perfect copy of it that I can then send incremental snapshots to?

3 comments

zfs send supports a resume token (-t) to resume interrupted streams received with (-s). Just use normal send/receive until you have the full stream sent.
I think it's more if you want to not send scratch or cached files you can have it automatically remove it from the snapshot being sent

> Redacted send/receive is a three-stage process. First, a clone (or clones) is made of the snapshot to be sent to the target. In this clone (or clones), all unnecessary or unwanted data is removed or modified. This clone is then snapshotted to create the "redaction snapshot" (or snapshots).

Think of it like a selective sync in Dropbox or SyncThing at the FS level.

That's a protocol problem, use a protocol such as rsync. You don't need to use redacted sends/recvs.
rsync doesn't scale like zfs send/recv. It requires scanning of every file at both the source and destination to compute the delta to send. zfs snapshots and send/recv don't need to do that. The delta is already fully described by the snapshots themselves. zfs is also working with immutable snapshots. It guarantees the source and destination copies are identical; rsync can't do much about the source and destination being modified while it is running since it's reliant upon other users of the system not touching the data being synced.

That's not to say rsync doesn't work. It does. But it doesn't scale well, and the data integrity guarantees aren't there.

rsync has it's own issues if the connection has high latency though - zfs send was originally developed by a Sun engineer who wanted to speed up large transfers to servers in China, if I recall correctly.
+1 for rsync, but with check-summing turned on, i think that's acceptable for 40GB.
It's not really enough for ZFS (unfortunately). It won't move snapshots, bookmarks etc.