| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by int_19h 2723 days ago
	Modern Python uses whatever representation is sufficient to ensure 1-unit-per-codepoint for a given string (which it can do on creation, since strings are immutable). So you get ASCII, UTF-16 sans surrogate pairs, or UTF-32. This is great for high-level code, but painful to work with from native code, because it usually needs some specific encoding to call into other libraries, and it's usually UTF-8 - so you need to re-encode all the time.