| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ruediger 5163 days ago

> Also UTF-8 string can't be cut at arbitrary position.

Neither can be any other kind of Unicode string because of Combining Characters. That's why the Unicode standard (or an Annex) recommends algorithms for text segmentation.

(And if you really need to cut at a certain length then you can easily backtrack and find the beginning of the sequence by looking for the first byte with the MSB = 0)