|
|
|
|
|
by niccaluim
3550 days ago
|
|
FWIW the Unicode spec describes combining marks as characters in their own right. So if the intent is to reverse characters, page 21 does the job. The resulting sequences will potentially be defective but not ill-formed. That being said, an FAQ on combining characters points out that Unicode's definition of "character" may not match an end user's, and that it's best to use the word "grapheme" instead for clarity. (And that being said, if the typical end user knows what "grapheme" means, I'll eat my cat.) So from a practical standpoint, it's best to make sure that any input to rev is in one of the composed normal forms. (Incidentally, the proper sequence is <base character><combining character>…, not the other way around.) |
|
But there are real world characters that don't have precomposed forms (IIRC e.g. indic scripts).