|
|
|
|
|
by rzzzt
486 days ago
|
|
I experimented with a similar, "hardlink farm"-style approach for deduplicated, browseable snapshots. It resulted in a small bash script which did the following: - compute SHA256 hashes for each file on the source side - copy files which are not already known to a "canonical copies" folder on the destination (this step uses the hash itself as the file name, which makes it easy to check if I had a copy from the same file earlier) - mirror the source directory structure to the destination - create hardlinks in the destination directory structure for each source file; these should use the original file name but point to the canonical copy. Then I got too scared to actually use it :) |
|
In any case, a good design is to ask the kernel to do the dedupe step after user space has found duplicates. The kernel can double-check for you that they are really identical before doing the dedupe. This is available on Linux as the ioctl BTRFS_IOC_FILE_EXTENT_SAME.