Hacker News new | ask | show | jobs
by kaetemi 1707 days ago
Wrote a case conversion that processes UTF-8 directly last year for Ryzom Core. The tables look like a mess, but it massively improved performance over the code that was replaced. Case changes seem to be called more often than I expected in the game. I do wonder if there's any cleaner and faster way.

https://github.com/ryzom/ryzomcore/blob/core4/nel/src/misc/s...

1 comments

Working directly on encoded UTF-8 sequences is a nice trick that allows to lookup Unicode properties without even decoding a character. I did something similar for Apache Lucy [1]. Note that you can store the data for each "level" in a single table and compute the index with bit operations as explained in the article.

[1] https://gitbox.apache.org/repos/asf?p=lucy.git;a=blob;f=core...