|
|
|
|
|
by chc
5187 days ago
|
|
How about this simplified version: 1. Try both byte orders 2. If one produces valid text and the other does not, choose that one (this will get you the correct answer almost every time, even if the source text is Chinese) 3. If both happen to produce valid text, use the one with the smallest number of scripts (Note that this just determines byte order, while Patrick was talking about the more ambitious task of heuristically determining whether a random string of bytes is text and if so what encoding it is. My point is just that you really don't need to be told the order of the bytes in most cases.) |
|
Try saving a text file in Windows XP Notepad with the words "Bush hid the facts" and nothing else. Close it and open the file again. WTF Chinese characters! Conspiracy!