Hacker News new | ask | show | jobs
by tgv 990 days ago
256 is indeed pretty limited.

Unicode does have a limited space, but it cannot be stored practically in single table. It currently runs up to 0x323AF, a bit over 200k, and most of the characters of course don't have a lower/uppercase mapping. The implementations I've seen do a few comparisons and then delegate to a table.

But: horses for courses. If you have to normalize a lot of Latin-1 text (such transform case, or strip diacritics), you can probably write some vector instructions that runs circles around a simple LUT. But it's not going to be as easy.