Hacker News new | ask | show | jobs
by mbrubeck 6169 days ago
"Second trial, you use a alpha-numeric whitelist and split on anything else, but what about umlauts? What about hebrew or cyrillic?"

A multi-lingual version of this could use the Unicode "General Category" character classes (Letter, Mark, Number, Punctuation, Symbol, Separator, Other).