Hacker News new | ask | show | jobs
by aartav 1089 days ago
I've been doing this kind of thing for years with two notable differences:

1. I don't believe people actually hand type-in these values, so I'm not really concerned about the 'l' vs '1' issue. I do base 32 without `eiou` (vowels) to reduce the likelihood of words (profanity) sneaking in.

2. I add two base-32 characters as a checksum (salted of course). This is prevents having to go look at the datastore when the value is bogus either by accident or malice. I'm unsure why other implementations don't do this.

4 comments

> base 32 without `eiou` (vowels) to reduce the likelihood of words (profanity) sneaking in.

We had “analrita” as an autogenerated password that resulted in a complaint many years ago. Might consider adding ‘a’ as an excluded letter.

Presumably base 32 means 26 letters + 10 digits - 4 banned letters

So adding an excluded letter is not easy.

Why not use base-31 and (optionally) more characters? (Or go upper and lower or add a symbol if you had to stay with a fixed-size and base-32 for some reason)
Because Base32 is just bit shifting and then converting the 5 bits into a char, and vice versa. Doing Base31 requires base conversion.
Wouldn’t that be excluded because i is already removed ?
I don't think they took offense to the "rita" part.
analratassart would probably be just as bad
ianal
I realise you posted this as a joke, but the first time I saw this, I was so confused. I thought the comment was starting with "I anal" before I read the rest of the post only to compute it means "I Am Not A Lawyer"
I agree with the addition of the checksum, however I’m curious:

> either by accident or malice

1. if you don’t believe people hand type these then how else will they accidentally enter an invalid? I suppose copy/paste errors, or if a page renders it as uppercase, though you should just normalize the case if it’s base 32.

2. How does a 2 byte (non-cryptographically secure) checksum help in the case of malice?

The checksum idea is interesting. I'm considering whether it makes sense to add it as part of the TypeID spec.
What value does the checksum provide? I think I'm missing something because I really don't see a benefit.
The benefit is that you can reject bad requests to an API more easily.

For one application I used a base 58 encoded value. Part of it was a truncated hmac, which I used like check digits. This meant I could validate IDs before hitting the DB. As an attacker or script kiddie could otherwise try a resource exhaustion attack.

So in the age of public internet faceing APIs and app urls, I think built in optional check digit support is a good idea.

I struggle to see how 10 bits of check data will help much. I guess if the extra bits aren’t persisted to storage it doesn’t hurt so why not?
Storage can get corrupted, columns can be truncated. For the applications I tend to touch correctness and the ability to detect errors and tamper are more important that a couple of bytes per row. But every application and domain is different.
Checksums facilitate error detection. For typed UUIDs, checksums help detect errors introduced by changing the prefix/type or changing a “digit”.
I implemented number two as part of an encoding scheme a few months ago. I'm not sure how much it's saved in terms of database lookups but it's aesthetically pleasing to know it won't hit a more inscrutable error while trying to decode.