| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by nitely 1644 days ago
	CPython 3 does use UTF-32 under the hood for strings (there is bytes for plain sequence of bytes). As you say, it's the worst of both worlds. High memory usage, and not really useful if you are dealing with unicode characters (grapheme clusters). My impression is most modern languages that bother with unicode (swift, rust, nim) are using utf-8, and doing linear time operations to handle unicode. I think that's the right approach, as I don't recall ever needing random access on a unicode string.

1 comments

astrange 1643 days ago

Swift originally had UTF-32 strings (an upgrade from ObjC which uses UTF-16), but they redid String to be UTF-8. It's typically the best choice even for CJK text, because it has ASCII mixed in often enough to still be smallest.

I don't think reversing a string is a meaningful operation though; there's no reason to think of a string as a "list of characters" when it's also a "list of words" and several other things. Swift provides more than one kind of iteration for that reason.

link