Hacker News new | ask | show | jobs
by int_19h 791 days ago
The simple thing to remember is that for all versions of Python going back 12 years, there's no such thing as "default encoding of string". A Python string is defined as a sequence of 32-bit Unicode codepoints, and that is how Python code perceives it in all respects. How it is stored internally is an implementation detail that does not affect you.
1 comments

32 bit specifically?

The most expansive Unicode has ever been was 31 bits, and UTF-8 is also capable of at most 31 bits.

You're right, the docs just say "Unicode codepoints", and standard facilities like "\U..." or chr() will refuse anything above U+10FFFF. However I'm not sure that still holds true when third-party native modules are in the picture.