|
|
|
|
|
by nine_k
2720 days ago
|
|
If you build a plain index over Unicode strings, chances are you're doing it wrong. Normal DB indexes are mostly for numbers and (short) ASCII strings. (Something like canonicalized UTF-8 is an edge case.) For strings that have encodings and locales, you likely need a full-text index, provided by your DB or by something like Solr / ElasticSearch. It addresses the oddities of human-oriented texts better. |
|
It's a pretty common use case (so common that it is usually taught in an introduction to databases) that you might want an index that allows you to efficiently search by non-ASCII strings like say, a person's last name.