|
|
|
|
|
by arcticbull
2239 days ago
|
|
Go's version and the Rust version differ in yet more subtle ways. It appears that Go's "rune" type is a Code Point, but Rusts's "char" type is a Unicode Scalar Value, a subset of Code Point that excludes surrogate pairs. Both versions will not work with complex Unicode input unless you perform both segmentation by Grapheme Cluster [1] and utilize a consistent Normalization [2] when comparing clusters. Unicode is hard, fams, and it's rare that anything that looks easy is actually what you want. [1] https://unicode.org/reports/tr29/ [2] http://unicode.org/reports/tr15/ |
|
Doing this is quite easy from Rust.