| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by dhosek 1385 days ago
	There are some basic things that can give huge performance increases without even parallelization: I wrote a Unicode crate because I needed a different interface to getting grapheme clusters from a string than the existing crates offered. I wrote a character category crate as well because I thought I would need it for this purpose (I didn’t). I managed to get double the performance of the existing segmentation crate and 10–20x the performance of the existing Unicode category crate. https://crates.io/crates/finl_unicode

1 comments

rurban 1385 days ago

All with two-step tables instead of range- and binary search?

That's extremely interesting, as I'm still favoring range- and binary search for most cases, just normalization lookup with two-step tables.

link

dhosek 1384 days ago

Yes. The two-step tables are really not that expensive and they enable features not possible with range and binary search, like identifying the category of a character cheaply.

link