Hacker News new | ask | show | jobs
by twelvechairs 3687 days ago
Its a better way of doing things - you can handle things in their native format rather than have to arbitrarily convert to UTF8 (which is an 'encoding' itself).

[edit] I remember a talk where Matz was asked this specific question and tried to explain it clearly but seemed confused as to how the questioner could have such a poor grasp of unicode (the difference between monolingual americans and japanese i guess)

1 comments

String is just a typedef for Vec<u8> with some extra convenience functions for working with UTF-8. There's nothing stopping anyone from just using Vec<u8> to handle non-UTF-8 data in their native format, nor stopping anyone from writing convenience types like String for other encodings.
Yeah right so Ruby effectively has just made a bunch of these (and done the hard work for you of defining how to convert between them and work with them all in similar ways), and the higher-level class which includes UTF8 and a whole bunch of others is called 'String'. Its really what you want from a high-level language - to just work with different encodings out of the box, but not have to convert to a standard interal type (like UTF8) to do so.
Well, it's hard to say, really. It depends on what you're doing. The benefit of converting to UTF-8 when making a string from bytes is that string operations have predictable performance, and strings have predictable memory-usage. But of course, you then have to pay the cost of converting to UTF-8.

On the other hand, if you just track the encoding in your string type, then you don't have to pay a conversion cost at the boundary, but each encoding will have different memory-usage and performance characteristics.