Hacker News new | ask | show | jobs
by dystroy 984 days ago
But not all normalizations are done to fight spam, not all of them should be interested in visual similarity.

I normalize strings in searches not because of bad intents but because for all user related purposes "Comunicações" and "Comunicações" are the same, their different encodings being more of an accident.

1 comments

*nod* ...and stemming is that taken to a greater extreme.

I was just pointing out that Unicode itself has various forms of normalization and normalization-adjacent functionality that people are far too unaware of.