Hacker News new | ask | show | jobs
by pdonis 2810 days ago
> No it doesn’t. The whole range of 128-159 are undefined.

Not in the sense of being decodable. If you decode a byte string with Latin-1, you get a unicode string containing code points 0-255 only, each code point matching exactly the numerical value of the corresponding byte in the byte string. So you can recover exactly the original byte string by re-encoding. Plus, every possible byte string is valid for decoding in Latin-1, so you will never get any decode/encode errors. As long as you don't care about the semantic meaning of bytes 128-255, this allows you to preserve the data while still working with Unicode strings.