Hacker News new | ask | show | jobs
by kibwen 3687 days ago
String is just a typedef for Vec<u8> with some extra convenience functions for working with UTF-8. There's nothing stopping anyone from just using Vec<u8> to handle non-UTF-8 data in their native format, nor stopping anyone from writing convenience types like String for other encodings.
1 comments

Yeah right so Ruby effectively has just made a bunch of these (and done the hard work for you of defining how to convert between them and work with them all in similar ways), and the higher-level class which includes UTF8 and a whole bunch of others is called 'String'. Its really what you want from a high-level language - to just work with different encodings out of the box, but not have to convert to a standard interal type (like UTF8) to do so.
Well, it's hard to say, really. It depends on what you're doing. The benefit of converting to UTF-8 when making a string from bytes is that string operations have predictable performance, and strings have predictable memory-usage. But of course, you then have to pay the cost of converting to UTF-8.

On the other hand, if you just track the encoding in your string type, then you don't have to pay a conversion cost at the boundary, but each encoding will have different memory-usage and performance characteristics.