Hacker News new | ask | show | jobs
by JoshTriplett 3620 days ago
> We have one string type defined in std

The standard library also includes Path/PathBuf and OsStr/OsString. And third-party libraries also use [u8] for bytestrings.

It'd be nice to improve handling for user-supplied text where you can't assume UTF-8. For instance, git2-rs provides the contents of diffs as [u8], because it can't assume the diffed files use UTF-8. That led to this commit today: https://github.com/ogham/rust-ansi-term/pull/19/commits/a0da...

That felt like a lot of boilerplate to abstract between str and [u8]. Is there a better way to solve that problem?

(As much as I'd love to just say "use UTF-8", that would break on many git repositories, including git.git and linux.git.)

2 comments

> The standard library also includes Path/PathBuf and OsStr/OsString.

Right. And you want people to explicitly convert between them.

Having to convert between string types isn't a problem. String encoding is hard, and you're going to have to pay that cost somehow.

Having to convert isn't a problem. Having to write some algorithms multiple times for different string types is a problem.
Fair. Most of these algorithms could be written generically I guess.
> That felt like a lot of boilerplate to abstract between str and [u8]. Is there a better way to solve that problem?

This doesn't have anything to do with large vs. small standard libraries, because all of these string types are defined in libstd.

libstd defines varying amounts of string manipulation and abstraction for those string types, though.

I'd love to see additional support for handling bytestrings in libstd, to make it easier to write code that handles both &str and &[u8].