Hacker News new | ask | show | jobs
by boogerlad 2020 days ago
I built something better that's actually secure and performant here: https://github.com/boogerlad/crypt-ids/

The issues with this are

1) as more IDs are generated, the probability of collision increases

2) a non integer primary key is slow

For a single database instance, it's far more performant to leave it as an autoincrementing integer and when it needs to be exposed, encrypt it in the backend before sending it out. Why not hashids.org? It's insecure: https://carnage.github.io/2015/08/cryptanalysis-of-hashids

6 comments

> I built something better that's actually secure and performant here

That's quite a bold claim, isn't it? I also don't see how it's an alternative to OP considering it doesn't even mention how to integrate the code with any database, let alone Postgres.

As an aside, how does your approach deal with collisions? Why would the chance of collisions be any lower than with the random approach if you're using what seems to come down to a cryptographically secure hash function?

Technically, it can be integrated with Postgres with https://github.com/wasmerio/wasmer-postgres. That said, it makes little sense to integrate this at the database level since it'll bloat up the column size, unless just the encrypted integer is stored (without converting to a base58 string).

As for collisions, a hash function maps an input of any length to an output of fixed length, so collisions will happen eventually. What I'm using is encryption, which cannot collide, otherwise decryption would be impossible https://crypto.stackexchange.com/questions/60473/can-collisi...

There are reversible bit hash functions too.

I've used twang's 64 bit mix in places where the domain is just as big as the hash space - 64 bit to 8 byte hash to 12 byte Y64 strings.

That said, the results are almost always ugly, if I had to do it again, I'd do it the way gfycat does (adjective, adjective, animal).

Thanks for pointing me to the concept of bit mix functions, very interesting.
The hashids documentation is pretty explicit about it not being secure. It says verbatim:

> Do you have a question or comment that involves "security" and "hashids" in the same sentence? Don't use Hashids.

It only states that near the end of the page. What is the purpose of it if it's not secure? It only prevents an attacker from performing naive attacks (+1 to get the next id, for example), which might be good enough... But anyone really determined can crack this.
I've done something like this but not as the PK.

Left the PK as an int or UUID, but added another column, with a trigger to auto-populate, that created a base-64 kind of thing. The trigger also detected colisions and tried again, up to 3 times before giving up and erroring.

There's no rule that says what you expose to the public has to be the pk at all. It seemed to me a good idea for it to not be.

In a database like MySQL where a clustered index exists for the primary key, or even a NoSQL DB like DynamoDB, it often makes sense to expose some version of the PK to the public if your lookup pattern for that resource is going to be by PK --- versus having some other "public" primary key field like you describe.

If you look up the resource by some other field, that means that you now need to support two indexes - one for the primary key, and one for the "public" primary key. This obviously requires more storage and comes with the performance overhead of keeping the second index updated on modifications to the table. Additionally, for something like DynamoDB where you pay per index, it could be cost prohibitive.

A better pattern is to simply encrypt/decrypt the primary key before exposing it publicly, such as in a URL. This requires no additional database overhead.

See also u/poitrus' https://github.com/rs/xid

Xid uses the Mongo Object ID algorithm to generate globally unique ids with a different serialization (base64) to make it shorter when transported as a string: https://docs.mongodb.org/manual/reference/object-id/

- 4-byte value representing the seconds since the Unix epoch,

- 3-byte machine identifier,

- 2-byte process id, and

- 3-byte counter, starting with a random value.

Ah, I wish I would've known about format-preserving encryption when I did something similar a few years back! I ended up using IDEA, which (at the time) was still in most crypto libraries and has a 64 bit block size.
FPE is complex and slow. I'd prefer a 64-bit blockcipher when it fits the application.
> 2) a non integer primary key is slow

Is this actually true in Postgres? Unless things have changed Postgres doesn't keep clustered indexes so index fragmentation is not an issue.

It is somewhat slower if you are looking up records by key. You still need an index to do that, and non-integer PKs such as UUIDs can be twice as large as the integer alternative, taking longer to search while requiring more memory.

That being said, in PostgreSQL, you are correct --- having a UUID (or something similar) as a PK is usually fine assuming you understand the implications. However I would absolutely avoid it in a DB like MySQL where PKs are clustered.