|
|
|
|
|
by kochthesecond
2079 days ago
|
|
Nice These three points has made me raving mad from working with mysql: - The default 'latin1' character set is in fact cp1252, not ISO-8859-1, meaning it contains the extra characters in the Windows codepage. 'latin2', however, is ISO-8859-2.
- The 'utf8' character set is limited to unicode characters that encode to 1-3 bytes in UTF-8. 'utf8mb4' was added in MySQL 5.5.3 and supports up to 4-byte encoded characters. UTF-8 has been defined to encode characters to up to 4 bytes since 2003.
- Neither the 'utf8' nor 'utf8mb4' character sets have any case sensitive collation other than 'utf8_bin' and 'utf8mb4_bin', which sort characters by their numeric codepoint. utf8 being effectively alias of utf8mb3 has cost us so much work its not even funny. |
|
An extra warning about that mess: mysqldump in many configurations will silently convert utf8mb4 down to utf8mb3. So when you're testing your backups or migrations, do an extra check to make sure that emoji and rarer characters didn't get eaten!