Hacker News new | ask | show | jobs
by user3939382 1575 days ago
This is probably a stupid question to those who work with these concepts often: can all the user data in the DB be hashed with the user’s password so that nothing is gained from a breach? Is this mostly a CPU resource problem or would would jwt architecture preclude that from working? (I haven’t built auth systems for several years)
7 comments

You could encrypt it with the user’s password instead (rather than hashing it). This is also the approach taken by e.g. password managers, they use your password as a seed for encrypting all your data.

The problem is that this would make the database entirely inaccessible unless you have access to the password. That creates quite a lot of friction in the user experience, the user would have to provide his password on every interaction (ie not just when logging in).

Users wouldn't need to provide their password on every interaction; just when logging in. The browser could save a derived decryption key in a cookie or local storage and use that to persist the session.

We're basically just discussing end-to-end encryption.

The real reason it's not done more often is that it makes things a lot of things way more complicated from a development perspective. Features like "allow users to send messages to each other" that would normally be really simple to implement suddenly require a whole public key infrastructure and logic to take into account edge cases like "What if the user got a new phone or changed their password and was offline when the message was sent?", or onerous threat models like "What if the server is controlled by an attacker when I sign-in?"

Not exactly following. Couldn't DMs simply not be E2E encrypted while maintaining encryption for personal info?
End to end encrypted with what key? What if the user changed their password? What if they got a new phone? What if the server is only pretending the user got a new phone to trick you into leaking your messages?

All of those problems are solvable, but "simply" is hardly the word I'd use to describe designing a secure end-to-end encrypted application. It's way, way more development effort than just "hash user passwords with bcrypt and don't allow access without the password", which is why it's rarely done unless E2E encryption is a major selling point of the application.

Sorry, still not following. I wrote not E2E encrypted. I'm struggling to understand why messages that are not E2E encrypted would require key management.
Sorry, misread.

Yes, you could symmetrically encrypt the tiny portion of personal data that needs to be read solely by you without much added complexity.

However, with few exceptions (password managers, backups, personal notes, etc), the whole point of uploading data to an online service is to allow it to be shared with other people or services. Once that happens, you need all those complicated key management and security systems I just talked about. It's effectively end-to-end encryption.

The data is read by more than one person, so this likely wouldn't work.

Also, I'm not sure this is an actual breach. I think they accidentally published the data themselves, that's the vibe I'm getting from reading between the lines. It's like the code maybe missed checking a flag that would exclude private records from showing.

That would seem to only work if the user would only be interested in records created by themselves or that were explicitly shared with them. When sharing both users passwords would have to be stored somewhere, either that or the raw content so that it could be reencrypted.

Private key cryptography would be better, maybe encrypt a private key with a password and store that along with the public?

The reason we can store and use password hashes is because the user provides their password every time they login. So we hash the password they provided at login and compare that to the hash that was stored.

We can't determine what their password is based on the hash alone, which is why we couldn't hash all the user data in the DB with their password and store that.

There's concept similar to what you're describing called crypto-shredding[1]. Hashing isn't a good way ensure the confidentiality of data--just the authenticity--you really want to prefer a solid cryptographic algorithm if your goal is to ensure data remains confidential.

The idea behind crypto shredding is that you have a cryptographic key for each entity in your system and you use that key encrypt all fields for a given record. When it comes time to delete that data, you simply discard the key used to encrypt it. Assuming you've used reasonably good cryptography, this data is now effectively gone.

This is useful in cases where:

* You need to support the right to be forgotten (as defined in the CCPA[2] or GDPR[3]), since all you need to do to "delete" a user's data is to delete the key used to encrypt.

* The data you need to delete exists across multiple data stores/applications/environments and ensuring consistency for the deletion across all these places is difficult. For example: You may have DB backups, long-lived caches, or 3rd party services/vendors that may have copies of this data.

* You want to discard some, but not all, of a user's data. This is important in cases you're required by law to retain specific kinds of information even after a person has required it's deletion. For example, banking and finance companies are required to keep specific records about who they sent money to or performed services for.

1. https://en.wikipedia.org/wiki/Crypto-shredding

2. https://www.oag.ca.gov/privacy/ccpa

3. https://en.wikipedia.org/wiki/General_Data_Protection_Regula...

Hashing would make the content irretrievable; something like XORing with the password would make the password recoverable if you know the content.
XORing with the password sounds just splendid :D Caesar is asking for his cipher back.

That method wouldn't stop a determined 12 year old, let alone a competent attacker. Please use a properly engineered and implemented encryption instead of coming up with harebrained schemes.

Right, which is why you would never XOR in this manner, and would hash instead. You don't want the password or content retrievable that easily.
Most systems store data to which more than one user needs access.

Most systems will restore access for a user who forgot their password.