Hacker News new | ask | show | jobs
by chadrs 3081 days ago
This is awesome, it reminds me when we decided to add unicode support to our API, but our code had been connecting to MySQL with Latin-1 connection. As long as you read from a Latin-1 connection, it looked like everything was correct, but what was actually being stored was the UTF-8 bytes being decoded as a Latin-1 string, and then re-encoded to UTF-8 since the column was UTF-8. Basically:

string.encode("utf-8").decode("latin-1").encode("utf-8")

although technically what mysql calls latin-1 is actually using Windows-1252 :(

1 comments

although technically what mysql calls latin-1 is actually using Windows-1252 :(

...and what mysql calls UTF-8 is a subset that only supports code points of up to three bytes! To get UTF-8 you need to use "utf8mb4". Why anybody uses mysql is beyond me.