Hacker News new | ask | show | jobs
by naringas 2481 days ago
from the article:

CPython since 3.3 makes the same idea three-level with UTF-32 semantics: Strings are stored as UTF-32 if at least one character has a non-zero bit in its most-significant half. Else if a string has a non-zero bits in its second-least-significant 8 bits of at least one character, the string is stored as UCS2 (i.e. UTF-16 excluding surrogate pairs). Otherwise, the string is stored as Latin1.