| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by zzo38computer 636 days ago

I think GNU iconv also internaly uses Unicode although the interface does not require it (so it would be possible to modify the implementation so that a direct conversion (without going through Unicode, unless you are deliberately converting from or to Unicode) will be used if possible, without changing the interface).

A better way to handle conversion of character encoding is: Each character encoding will be a specific character set, and the encoding and decoding function. And then, there can be conversion between character sets. A direct conversion would normally be better (if it is available), although sometimes an indirect conversion would also be possible. (To convert JIS to TRON, an indirect conversion is unlikely to be useful, but a direct conversion is not too difficult (I have implemented it before) and would be much more useful.)

Furthermore, there may be more than one way to convert between character sets, depending on the application and on other things, including what character properties are intended, etc. There are also sometimes other options desired, e.g. how to handle conversion of invalid encodings, invalid code points, ambiguous conversions, multiple ways to encode a sequence of characters (although there may be one "canonical" way), etc.

(There is also the question of if you need to convert the character sets at all (in some cases only the encoding needs to be converted); for example, if you have fonts with the proper character set already, then a conversion may be unnecessary. Nevertheless, the ability to convert is useful, so it is helpful to have programs that do so, for the cases where it is helpful to do the conversion.)

I will look at them more later (I have not had time to look at them thoroughly by now, but I had partially done), to see if I can contribute support for TRON (and possibly other character sets). Depending on how it is implemented, it might be easy or difficult to change it to do such things.