|
|
|
|
|
by dragontamer
990 days ago
|
|
The example code is a 256-byte lookup table, meaning the original example won't work with Unicode either. In fact, I'm pretty sure that Unicode cannot be solved with a lookup table due to its variable length, but maybe you can prove me wrong? (Assuming UTF8 here) |
|
Unicode does have a limited space, but it cannot be stored practically in single table. It currently runs up to 0x323AF, a bit over 200k, and most of the characters of course don't have a lower/uppercase mapping. The implementations I've seen do a few comparisons and then delegate to a table.
But: horses for courses. If you have to normalize a lot of Latin-1 text (such transform case, or strip diacritics), you can probably write some vector instructions that runs circles around a simple LUT. But it's not going to be as easy.