Hacker News new | ask | show | jobs
by unabridged 3173 days ago
This can't be too hard to build:

1. Generate 16TB of random data, backup/replicate many times

2. Think of data as 16 billion 1k pieces

3. Generate 64 random piece addresses using hashA(key) as seed

4. Concatenate the 64 pieces into one 64k chunk, and store hashB(chunk)

3 comments

Our algorithm is close to that;

1. We don't partition it into fixed size blocks, but rather index directly into the array

2. The site calculates a salted hash and sends us just the hash. We recommend at least a 32 byte CS-PRNG salt

3. We HMAC the hash with a 64-byte site-specific token (AppID) to produce the seed

4. We generate 64 uniformly distributed locations from the seed and perform 64 reads of 64 bytes each to form a 4096 byte buffer which we HMAC with the AppID to produce a second salt.

5. The site uses this second salt to HMAC their original hash, and store that.

This design allows multiple sites to securely share a single data pool and also means that our service a) does not see usernames or passsords, b) does not know if a login is valid/invalid, c) cannot do anything to make an invalid login look valid to the site.

There are some additional details to handle upgrading hashes as the data pool grows, and also to provide virtual private data pools for each site (so I can give you a copy of your data pool if you ever want to self-host). This is all detailed in [1] above.

That is an excellent idea. But why 16TB of random data ? Why not encrypt some high entropy value (digits of pi, whatever) with a 100 character password and generate 16TB like that. You then use the 16TB as a password but you could regenerate and recover using a scrap of paper.
You can do either. But if you generate the data pool from a seed that you retain, then you're back to trying to protect a 256-bit value from leaking.

Generating the data pool with constantly cycled and discarded keys (i.e. /dev/urandom) means the only way to have the pool is to go and get every single bit of it.

We went the second route because I like sleeping at night and it just felt like retaining a seed would defeat the whole purpose of bounded retrieval.

Sure, but that's a 256-bit value that does not have to be present at the use point. So it's a lightweight anchor ! It's extremely heavy when someone else tries to move it, and yet when you move it yourself, it easily fits in your wallet on the tiniest of sd cards, or even on a scrap of paper.
How about this? Take the old Blowfish block encryption algorithm and eliminate the key expansion and expand it so that the s-boxes and p-array take up 16TB of data? What you'd wind up with is a block cipher that has a 16TB key. Since Blowfish is clearly "prior art," and is unencumbered by patents, this might make this approach harder to attack using patent law.