Hacker News new | ask | show | jobs
by meejah 2656 days ago
Tahoe-LAFS does "erasure coding" on the chunks of data. This increases the size of the data (adding redundancy) so that you can recover a file without recovering every single chunk. These parameters are decided client-side. In the smallest possible case (i.e. every chunk required) there is some slight overhead from the zfec and Tahoe headers.

If you are using redundancy of any kind, it will inflate the size of the ciphertext versus the plaintext thus affecting sync speed.

Tahoe-LAFS does split everything up into fixed-size chunks, though, so the total size of the file doesn't really matter -- it will still be uploaded in 128kb (default) chunks to the storage servers.

So, it's not the encryption that has an impact but the erasure-coding (which gives the "RAID-like" features) and you can configure it to have zero redundancy and thus only some slight increase in the total amount of data to send.

1 comments

Hadn't even thought about a difference in size; I was thinking the CPU overhead. If I save a 1GB file, how much processor time will it take to re-encrypt the whole thing so it can be sent off? Or does the chunking apply here too; i.e. only the chunk of the file that's changed has to be re-encrypted?
I don't know the exact answer to that, but "not much" in comparison to the time to send the bytes over the network. The actual contents are encrypted using AES which often has built-in instructions on modern processors and is thus very fast. The vast majority of the time is uploading time here.

Tahoe does use "convergent encryption" (basically, the key is based on the contents) so that the same file encrypted by the same client results in the same ciphertext (and thus, doesn't need to be re-uploaded).

I believe that only happens at the "capability" (i.e. file) level, though, not each chunk. So, if you had a directory of 10 files each 100MB and changed one, you'd only have to upload the new directory-descriptor and the one changed file -- but if you change a few bytes of a 1GB file, you'd have to upload all the ciphertext for that file again.

Thanks for the well-informed answers!