| Neat library! We're using randomly generated strings for many things. IDs, password recovery tokens, etc. We've generated millions of them in our system, for various use-cases. Hundreds of thousands of people see them every day. I've never heard any complaints about a random content-id being "lR8vDick4r" (dick) or whatever. But nowadays our society is so afraid of offending anyone, that profanity filters has extended all the way to database IDs and password recovery tokens. (there are some legit cases, like randomly generated IDs for user profiles shared in public URLs, that users have to live with, but even there just make the min length 8 and you're unlikely to have any full-word profanity as the complete ID; put differently, I don't understand why they made the block list an opt-out thing) |
First, it's highly incomplete because you can find at least 10x more combinations spelling the same "word". And probably 10x more slurs that aren't in this block list. Second, because it's hardcoded in your source. Third, because there are more elegant solutions.
Such as to pick an alphabet that can't spell readable words unless you're trying really hard to read a slur into it. Say this (no vowels or digits):
bcdfghjklmnpqrstvwxyzBCDFGHJKLMNPQRSTVWXYZ (length 42)
The full lower+upper+digits alphabet they use is 62. Feels like you're losing a lot, but... not really.
- A 128-bit id in base 62 = 22 letters.
- A 128-bit id in base 42 = 24 letters.
JUST TWO MORE LETTERS. And it's one more letter for 64-bit id (11 vs 12). And we can avoid this entire silliness. The problem is the author doesn't realize that logN is... logarithmic, I suppose.