Thank you for telling me about this - I didn't know they did that...
My first thought as I started reading the PEP was, "Why did they bother adding the 'bytes' type if 'str' is just going to be able to hold everything anyways?"
After looking at more of it though, it seems like they're storing the binary octets as code points in one of several internal Unicode representations. Moreover, they're abusing (reusing?) the range of code points reserved for 16 bit surrogate pairs, but only using the low half of the pair. This is all clever in the bad way.
This seems like a real lack of taste to me, and I doubt the Guido from 1991 would've found it acceptable to have 'str', 'bytes', and 'bytearray' the way they are. (Let's ignore 'buffer' became 'memoryview' for now...) It used to be a simple and elegant language.
My first thought as I started reading the PEP was, "Why did they bother adding the 'bytes' type if 'str' is just going to be able to hold everything anyways?"
After looking at more of it though, it seems like they're storing the binary octets as code points in one of several internal Unicode representations. Moreover, they're abusing (reusing?) the range of code points reserved for 16 bit surrogate pairs, but only using the low half of the pair. This is all clever in the bad way.
This seems like a real lack of taste to me, and I doubt the Guido from 1991 would've found it acceptable to have 'str', 'bytes', and 'bytearray' the way they are. (Let's ignore 'buffer' became 'memoryview' for now...) It used to be a simple and elegant language.