|
|
|
|
|
by rtpg
4265 days ago
|
|
the thing is if you build for unicode support from the start these conversations don't need to be had. The problem is not enough people treat text as a black box from the start (I can understand unwillingness to support bigger things like RTL) |
|
I'd be willing to bet money that at least some of the formats in question aren't UTF-8, they are likely ASCII encoded against a character set or code page.
Then you have to read that codepage, and convert the necessary characters to their Unicode equivalents, and from there do you downcode to utf-8?
Does the language this library is written in support that translation? Are there modules to do that? Is the license for those module(s) necessary compatible?
Who's going to go through the different document versions to confirm, and adjust for the various encodings for non-ascii characters?
It's not as simple as saying "don't choke on unicode".