| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by pornel 2716 days ago

It's not a problem in practice, because you'd use something like `.char_indices()` iterator, or result from a substring search, etc. to get correct offsets in the first place.

It's not useful to blindly read at random offsets in UTF-8 strings. If it didn't panic, you'd get garbage. If offsets were automatically moved to skip over garbage, you wouldn't know what you're getting, and your overall algorithm would likely end up with nonsense (duplicated or skipped chars).

For algorithms that don't care about characters or UTF-8 validity, there's zero-cost `.as_bytes()`.

2 comments

KajMagnus 2716 days ago

Couldn't syntax like `a_string[..3]` be made to result in compilation errors in Rust? Since that'd almost always be a bug? (right?)

And in the rare cases, when it's not a bug, then one can just use `as_bytes` which would be good to do in any case, to indicate to other humans that this is not a bug.

B.t.w. I love the error message `[..3]` generates: "thread 'main' panicked at 'byte index 3 is not a char boundary; it is inside '早' (bytes 2..5) of `ab早`'" — I've never seen such easy to understand error messages in any language (except for in a few cases in Scala).

link

steveklabnik 2716 days ago

We could have never implemented Index for String, sure. We have though, so removing it would be a breaking change.

link

KajMagnus 2715 days ago

Ok (Maybe a compile time warning? that doesn't break the build)

link

steveklabnik 2715 days ago

That could be done, if it was agreed that this is a mis-feature. I don't think there's agreement on that, though.

link

StavrosK 2716 days ago

What does zero-cost mean in this context? It must cost something to run, no? Or is it basically a compiler hint instructing the next function to treat the data as pure bytes?

link

burntsushi 2716 days ago

In this particular context, you can think of going from a `&str` to a `&[u8]` via `string.as_bytes()` as a safe cast. The in-memory representation remains the same, and the function call will almost certainly be inlined because its implementation is trivial.

link