Hacker News new | ask | show | jobs
by CiPHPerCoder 2855 days ago
H(ssn) just kicks the problem downstream.

  - If H is a simple cryptographic hash function, it's not resistant
    to brute-force attacks to recover the SSN
  - It's not revokable
What we need is something more akin to a Credit Card number. Something like an abstraction layer. It might even be implementable as a UUID.

If you need to revoke it, you can do so since it's not cryptographically tied to anything.

Failing that, a base32-encoded random string (without = padding) with an optional checksum would do the trick.

3 comments

(disclaimer: I work at VGS)

We offer a variety of various format preserving aliasing algorithms. Only legacy systems tend to choose the SSNs if they have fixed-width columns in their RDBMS that are difficult to change (imagine petabytes of data).

The idea behind format preserving aliases is actually based on the NIST SP 800-3G standard[1]. We use FF1 and are actively engaging with the world's leading cryptographers such as: https://cryptoonline.com/publications/.

Happy to share more in detail if there's interest. Please email me: mahmoud @ ${COMPANY_NAME}.com

[1] https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.S...

I'd like to make but one appeal to everyone reading this thread:

Ask your cryptographer if the algorithm they're proposing you to use is IND-CCA2 secure (especially if it meets the criteria for IND-CCA3).

Litmus test: If they don't know what that means, you shouldn't be trusting them for cryptography advice.

If it isn't IND-CCA2 secure, you shouldn't be using it. Full stop.

For the curious: https://tonyarcieri.com/all-the-crypto-code-youve-ever-writt...

The IND in IND-CCA2 means "INDistinguishable"; i.e. from randomly generated line noise. For symmetric cryptography, your ciphertext shouldn't have any structure to it. (Lattices and such are a different story. If structure is permissible for your security goals, you're probably doing asymmetric cryptography anyway.)

To be clear: Format-preserving, order-preserving, order-revealing, and homomorphic encryption technology-- while an exciting research area-- fails to meet this requirement and should not be used for non-experimental purposes until their techniques have had time to mature. And even then, until they meet this requirement, only when the threat model doesn't realistically include the possibility of adaptive chosen-ciphertext attacks. (Spoiler: A real world threat model will almost certainly always include that.)

> We use FF1 and are actively engaging with the world's leading cryptographers

I've seen this "we engage with the world's leading cryptographers" genre of claim before, albeit from a much more arrogant source: https://news.ycombinator.com/item?id=6916860

Please provide an example of an RDBMS with petabytes of data. Seems unlikely.
NASDAQ had a 2 Petabyte Microsoft SQL Server.

https://customers.microsoft.com/en-us/story/nasdaq-omx-group...

The article says the token "maps" to the SSN, and since they want to give different tokens to different vendors using VGS, I'd assume they're either wholly random tokens associated in a database somewhere or that some other factor of randomness is added in.

But the issue I see is that there still has to be a way that the user is handing say, their SSN to a website, for it to request the token key that associates with it, which is a big risk point. Because they need to identify themselves in a way that can identify the correct VGS account to talk to?

I mean, I think really you'd be better off doing a private/public key thing, where you have some sort of device that gives a sub-key of your master identity key to the vendor?

Salt?
Exactly, it seems way too complex. I don't know why my insurance company can't give me a 9-digit number that is HASH(SSN + member_id) and tell me to use that instead of my SSN.
It also would need to be salted