Hacker News new | ask | show | jobs
by rarrrrrr 3379 days ago
Yes, of course there's some traffic analysis that would be possible, as there would be with any such service. But for the record: we keep logs for a limited time, and we don't just encrypt each file individually.

Instead there's an encrypted journal and encrypted data blocks. (Having the additional layer of data blocks allows for better deduplicating one version of a file to the next.) So for each transaction that's uploaded to the servers, we know that the journal gets longer, and that data blocks are added or removed (or both.)

All the database work for keeping track of the data blocks (reference accounting, garbage collection) is done client side. More details in this post from 2009: https://spideroak.com/articles/why--how-spideroak-architectu...

2 comments

This is pretty impressive. I thought that you only encrypt content and filename. But this goes way beyond what I expected from such a service.
Right, some leakage is inherent and what you provide may be good enough or even the best you can reasonably do. However, There's a long history (even in academic crypto) of what seems like insignificant leakage being important. So it's good to be overt about it.

So it looks you are doing blockwise encryption? Which means at least conceptually, not only do you leak when a file is updated, you leak what chunk? At least I'm assuming the journal isn't append only.

For what it's worth, I think of a journal as append only by definition and that's what SpiderOak does. Unless you have millions of very small files, the journal is going to be tiny relative to the backup content so this is fine.

So the server doesn't have a concept of "an existing file was updated" vs "a new file was uploaded" etc. The server only knows "new blocks have arrived." All the "smarts" are on the client.

In general operation, only new journal entries and new blocks are added. The only time blocks are removed is when the user intentionally chooses to remove data (we call that operation "purge") Intentional purges can also reduce the total size of the journal, and this the only operation that does so.

Most backup software removes previous versions and deleted files after 30 days, but SpiderOak keeps these indefinitely by default, to allow for for point in time recovery, restore from ransom ware infections, mistakes you don't catch right away, etc. You can set a different retention policy if you prefer.