| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by twelvechairs 3687 days ago
	Its a better way of doing things - you can handle things in their native format rather than have to arbitrarily convert to UTF8 (which is an 'encoding' itself). [edit] I remember a talk where Matz was asked this specific question and tried to explain it clearly but seemed confused as to how the questioner could have such a poor grasp of unicode (the difference between monolingual americans and japanese i guess)

1 comments

kibwen 3687 days ago

String is just a typedef for Vec<u8> with some extra convenience functions for working with UTF-8. There's nothing stopping anyone from just using Vec<u8> to handle non-UTF-8 data in their native format, nor stopping anyone from writing convenience types like String for other encodings.

link

twelvechairs 3687 days ago

Yeah right so Ruby effectively has just made a bunch of these (and done the hard work for you of defining how to convert between them and work with them all in similar ways), and the higher-level class which includes UTF8 and a whole bunch of others is called 'String'. Its really what you want from a high-level language - to just work with different encodings out of the box, but not have to convert to a standard interal type (like UTF8) to do so.

link

wtetzner 3687 days ago

Well, it's hard to say, really. It depends on what you're doing. The benefit of converting to UTF-8 when making a string from bytes is that string operations have predictable performance, and strings have predictable memory-usage. But of course, you then have to pay the cost of converting to UTF-8.

On the other hand, if you just track the encoding in your string type, then you don't have to pay a conversion cost at the boundary, but each encoding will have different memory-usage and performance characteristics.

link