Hacker News new | ask | show | jobs
by ks2048 160 days ago
Do the delimiters have to be single bytes? e.g. Japanese full stop (IDEOGRAPHIC FULL STOP) is 3 bytes in UTF-8.
1 comments

No, delimiters can be multiple bytes. They have to be passed as a pattern.

// With multi-byte pattern

let metaspace = "<japanese_full_stop>".as_bytes();

let chunks: Vec<&[u8]> = chunk(text).pattern(metaspace).prefix().collect();