| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by JoshTriplett 3620 days ago

> We have one string type defined in std

The standard library also includes Path/PathBuf and OsStr/OsString. And third-party libraries also use [u8] for bytestrings.

It'd be nice to improve handling for user-supplied text where you can't assume UTF-8. For instance, git2-rs provides the contents of diffs as [u8], because it can't assume the diffed files use UTF-8. That led to this commit today: https://github.com/ogham/rust-ansi-term/pull/19/commits/a0da...

That felt like a lot of boilerplate to abstract between str and [u8]. Is there a better way to solve that problem?

(As much as I'd love to just say "use UTF-8", that would break on many git repositories, including git.git and linux.git.)

2 comments

Manishearth 3620 days ago

> The standard library also includes Path/PathBuf and OsStr/OsString.

Right. And you want people to explicitly convert between them.

Having to convert between string types isn't a problem. String encoding is hard, and you're going to have to pay that cost somehow.

link

JoshTriplett 3620 days ago

Having to convert isn't a problem. Having to write some algorithms multiple times for different string types is a problem.

link

Manishearth 3620 days ago

Fair. Most of these algorithms could be written generically I guess.

link

pcwalton 3619 days ago

> That felt like a lot of boilerplate to abstract between str and [u8]. Is there a better way to solve that problem?

This doesn't have anything to do with large vs. small standard libraries, because all of these string types are defined in libstd.

link

JoshTriplett 3619 days ago

libstd defines varying amounts of string manipulation and abstraction for those string types, though.

I'd love to see additional support for handling bytestrings in libstd, to make it easier to write code that handles both &str and &[u8].

link