Hacker News new | ask | show | jobs
by tgv 990 days ago
Except that in Unicode, this doesn't work.
2 comments

The example code is a 256-byte lookup table, meaning the original example won't work with Unicode either.

In fact, I'm pretty sure that Unicode cannot be solved with a lookup table due to its variable length, but maybe you can prove me wrong? (Assuming UTF8 here)

256 is indeed pretty limited.

Unicode does have a limited space, but it cannot be stored practically in single table. It currently runs up to 0x323AF, a bit over 200k, and most of the characters of course don't have a lower/uppercase mapping. The implementations I've seen do a few comparisons and then delegate to a table.

But: horses for courses. If you have to normalize a lot of Latin-1 text (such transform case, or strip diacritics), you can probably write some vector instructions that runs circles around a simple LUT. But it's not going to be as easy.

What the fuck you expect "bit reversal" to even do in unicode ?
I had the vague impression this was about lowercase. No need for expletives. Perhaps you should comment elsewhere.