|
This is really interesting and something I've been thinking about for a while now. The SEMANTICS[1] doc details what is and isn't supported from a POSIX filesystem API perspective, and this stands out: Write operations (write, writev, pwrite, pwritev) are not currently supported. In the future, Mountpoint for Amazon S3 will support sequential writes, but with some limitations:
Writes will only be supported to new files, and must be done sequentially.
Modifying existing files will not be supported.
Truncation will not be supported.
The sequential requirement for writes is the part that I've been mulling over whether or not it's actually required in S3. Last year I discovered that S3 can do transactional I/O via multipart upload[2] operations combined with the CopyObject[3] operation. This should, in theory, allow for out of order writes, existing partial object re-use, and file appends.[1] https://github.com/awslabs/mountpoint-s3/blob/main/doc/SEMAN... [2] https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuove... [3] https://docs.aws.amazon.com/AmazonS3/latest/API/API_CopyObje... |
What I did is:
1. Create 10000 files, each of 1MB size, so that the total usage is 10GB.
2. Mount each file as a loopback block device using `losetup`.
3. Create a RAID device over the 10000 loopback devices with `mdadm --build --level=linear`. This RAID device appears as a single block device of 10GB size. `--level=linear` means the RAID device is just a concatenation of the underlying devices. `--build` means that mdadm does not store metadata blocks in the devices, unlike `--create` which does. Not only would metadata blocks use up a significant portion of the 1MB device size, but also I don't really need mdadm to "discover" this device automatically, and also the metadata superblock does not support 10000 devices anyway (the max is 2000 IIRC).
4. From here the 10GB block device can be used as any other block device. In my case I created a LUKS device on top of this, then an XFS filesystem on the top of the LUKS device, then that XFS filesystem is my backup directory.
So any modification of files in the XFS layer eventually results in some of the 1MB blocks at the lowest layer being modified, and only those modified 1MB blocks need to be synced to the WebDAV server.
(Note: SI units. 1KB == 1000B, 1MB == 1000KB, 1GB == 1000MB.)