There are some basic things that can give huge performance increases without even parallelization: I wrote a Unicode crate because I needed a different interface to getting grapheme clusters from a string than the existing crates offered. I wrote a character category crate as well because I thought I would need it for this purpose (I didn’t). I managed to get double the performance of the existing segmentation crate and 10–20x the performance of the existing Unicode category crate.
Yes. The two-step tables are really not that expensive and they enable features not possible with range and binary search, like identifying the category of a character cheaply.
https://crates.io/crates/finl_unicode