| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by aartav 1089 days ago

I've been doing this kind of thing for years with two notable differences:

1. I don't believe people actually hand type-in these values, so I'm not really concerned about the 'l' vs '1' issue. I do base 32 without `eiou` (vowels) to reduce the likelihood of words (profanity) sneaking in.

2. I add two base-32 characters as a checksum (salted of course). This is prevents having to go look at the datastore when the value is bogus either by accident or malice. I'm unsure why other implementations don't do this.

4 comments

sokoloff 1089 days ago

> base 32 without `eiou` (vowels) to reduce the likelihood of words (profanity) sneaking in.

We had “analrita” as an autogenerated password that resulted in a complaint many years ago. Might consider adding ‘a’ as an excluded letter.

michaelt 1089 days ago

Presumably base 32 means 26 letters + 10 digits - 4 banned letters

So adding an excluded letter is not easy.

sokoloff 1089 days ago

Why not use base-31 and (optionally) more characters? (Or go upper and lower or add a symbol if you had to stay with a fixed-size and base-32 for some reason)

chipsa 1089 days ago

Because Base32 is just bit shifting and then converting the 5 bits into a char, and vice versa. Doing Base31 requires base conversion.

manquer 1089 days ago

Wouldn’t that be excluded because i is already removed ?

jhgg 1089 days ago

I don't think they took offense to the "rita" part.

bombcar 1089 days ago

analratassart would probably be just as bad

EGreg 1089 days ago

ianal

spoiler 1089 days ago

I realise you posted this as a joke, but the first time I saw this, I was so confused. I thought the comment was starting with "I anal" before I read the rest of the post only to compute it means "I Am Not A Lawyer"

tlrobinson 1089 days ago

I agree with the addition of the checksum, however I’m curious:

> either by accident or malice

1. if you don’t believe people hand type these then how else will they accidentally enter an invalid? I suppose copy/paste errors, or if a page renders it as uppercase, though you should just normalize the case if it’s base 32.

2. How does a 2 byte (non-cryptographically secure) checksum help in the case of malice?

dloreto 1089 days ago

The checksum idea is interesting. I'm considering whether it makes sense to add it as part of the TypeID spec.

veec_cas_tant 1089 days ago

What value does the checksum provide? I think I'm missing something because I really don't see a benefit.

diroussel 1085 days ago

The benefit is that you can reject bad requests to an API more easily.

For one application I used a base 58 encoded value. Part of it was a truncated hmac, which I used like check digits. This meant I could validate IDs before hitting the DB. As an attacker or script kiddie could otherwise try a resource exhaustion attack.

So in the age of public internet faceing APIs and app urls, I think built in optional check digit support is a good idea.

veec_cas_tant 1085 days ago

I struggle to see how 10 bits of check data will help much. I guess if the extra bits aren’t persisted to storage it doesn’t hurt so why not?

diroussel 1084 days ago

Storage can get corrupted, columns can be truncated. For the applications I tend to touch correctness and the ability to detect errors and tamper are more important that a couple of bytes per row. But every application and domain is different.

bumbledraven 1089 days ago

Checksums facilitate error detection. For typed UUIDs, checksums help detect errors introduced by changing the prefix/type or changing a “digit”.

zrail 1089 days ago

I implemented number two as part of an encoding scheme a few months ago. I'm not sure how much it's saved in terms of database lookups but it's aesthetically pleasing to know it won't hit a more inscrutable error while trying to decode.