|
|
|
|
|
by chadrs
3081 days ago
|
|
This is awesome, it reminds me when we decided to add unicode support to our API, but our code had been connecting to MySQL with Latin-1 connection. As long as you read from a Latin-1 connection, it looked like everything was correct, but what was actually being stored was the UTF-8 bytes being decoded as a Latin-1 string, and then re-encoded to UTF-8 since the column was UTF-8. Basically: string.encode("utf-8").decode("latin-1").encode("utf-8") although technically what mysql calls latin-1 is actually using Windows-1252 :( |
|
...and what mysql calls UTF-8 is a subset that only supports code points of up to three bytes! To get UTF-8 you need to use "utf8mb4". Why anybody uses mysql is beyond me.