Hacker News new | ask | show | jobs
by jansan 1386 days ago
I do not know about the real world implications of this, but just reading of a 20x performance increase for standard cases makes me excited.
1 comments

There are some basic things that can give huge performance increases without even parallelization: I wrote a Unicode crate because I needed a different interface to getting grapheme clusters from a string than the existing crates offered. I wrote a character category crate as well because I thought I would need it for this purpose (I didn’t). I managed to get double the performance of the existing segmentation crate and 10–20x the performance of the existing Unicode category crate.

https://crates.io/crates/finl_unicode

All with two-step tables instead of range- and binary search?

That's extremely interesting, as I'm still favoring range- and binary search for most cases, just normalization lookup with two-step tables.

Yes. The two-step tables are really not that expensive and they enable features not possible with range and binary search, like identifying the category of a character cheaply.