It's less that it's "hard" and more that it's generally not a sensible operation unless performed on known well-restricted domain (or if you're up to flipping an image of rendered text, I guess).
Combining characters are the most obvious problem.
Both of these are visually identical as "naïve", however the first is written with "ï" being a single code point, while the second is an "i" followed by a combining dieresis. In the first example, the dieresis correctly stays attached to the i, while in the second dieresis incorrectly moves to the v. To do it right, you have to scan through the string and keep the base character and all combining characters in order.
Next question. :)