|
|
|
|
|
by t_hozumi
4647 days ago
|
|
I think that there is still a fundamental problem of string encoding. The problem is that decoders cannot know what encoding a byte stream was encoded in without additional information.
Such information are often lost or omitted as you can see in web world. In such a situation, what decoders can do is just guessing. This is the reason why we still suffer Mojibake. A possible solution was to attach encoding information to a head of bytes as one or two byte. For example: UTF-8 = 0b00000001 UTF-16 = 0b00000002 Shift_JIS = 0b00000003 EUC-JP = 0b00000004 and so on. Of course this is not actual and reasonable solution because everyone must switch decoder/encoder to this protocol at once. |
|