Hacker News new | ask | show | jobs
by ttfkam 1058 days ago
I love pgcrypto and Postgres in general. In this case, I'm wondering why you'd perform such widespread encryption on a per-column basis. If there is that much to be encrypted, there's whole-disk encryption at rest. For network transport there's TLS. Many column-level encryptions/decryptions would seem to me to put an undo CPU load on the database instance(s) when that load could be spread horizontally more easily at the app tier.

If you encrypt within Django, all Postgres needs to worry about are bytea columns. If the concern is being able to effectively use the decrypted data in relation joins, I think back again to whole-disk encryption. To use this stuff, it has to be decrypted in memory anyway.

As a thought experiment, you could create expression indexes for fast lookup, but that leads to data leakage through index queries, and you're right back where you started, only with higher CPU load.

For per-user encryption, that also seems best/most flexible at the app tier.

In short, for a limited use case like saving passwords or an opaque data blob, pgcrypto within Postgres makes sense to me. As an overarching whole-database encryption strategy, I'm far less sure of its utility.

2 comments

So that people handling the servers are not tempted to look at them, so that backup don't contain them in clear text, so that export must explicitly chose to decrypt them or not, etc.
Great callouts. It's about ergonomics, making default actions more safe, and reducing unsafe surface area even if it can't be fully removed.
These are django fields, so you would only use them for your "sensitive columns" like payment information, emails, other rarely used stuff.

I checked the readme and I might have missed it but this doesn't seem to be suggesting you replace every column with these, these are just helpers to make encrypting specific columns easier.

Because it's rarely used columns, or columns you'd need to wait for an external api anyway (email, sms, payment, etc) the performance impact should be minimal. You wouldn't need these fields to be indexed.

The attack surfaces this addresses is the compromise of postgres or it's host, or miss handled backups. Preferably you'd be using this on top of full-disk encryption.

EDIT: The use of PGP is weird to me though, why not AES?