| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by _ache_ 791 days ago
	Thank you ! The documentation was misleading about "default encoding of string".

1 comments

int_19h 791 days ago

The simple thing to remember is that for all versions of Python going back 12 years, there's no such thing as "default encoding of string". A Python string is defined as a sequence of 32-bit Unicode codepoints, and that is how Python code perceives it in all respects. How it is stored internally is an implementation detail that does not affect you.

link

Dylan16807 791 days ago

32 bit specifically?

The most expansive Unicode has ever been was 31 bits, and UTF-8 is also capable of at most 31 bits.

link

int_19h 790 days ago

You're right, the docs just say "Unicode codepoints", and standard facilities like "\U..." or chr() will refuse anything above U+10FFFF. However I'm not sure that still holds true when third-party native modules are in the picture.

link