Hacker News new | ask | show | jobs
by masklinn 2963 days ago
> Rust should expose such a function as well seeing as their strings must be UTF-8 validated.

That's more or less what std::str::from_utf8 is: it runs UTF8 validation on the input slice, and just casts it to an &str if it's valid: https://doc.rust-lang.org/src/core/str/mod.rs.html#332-335

from_utf8_unchecked nothing more than an unsafe (c-style) cast: https://doc.rust-lang.org/src/core/str/mod.rs.html#437 and so should be a no-op at runtime.

1 comments

I meant Rust should have a SIMD optimised version that assumes mostly ASCII. I'm guessing there is a trade-off involved depending on the content of the string.
The linked implementation assumes mostly ASCII. It doesn't use SIMD. SIMD in Rust is a work in progress - the next version (1.27) will stabilize x86-64-specific SIMD intrinsics. There's an rfc (https://github.com/rust-lang/rfcs/pull/2366) for portable SIMD types.
simd support in Rust was only recently accepted and is being implemented so it currently relies on the vectorisation abilities on the compiler (it might get revisited soon-ish I guess).

As for the assumption of mostly-ascii, the validation function has a "striding" fast path for ascii which checks 2 words at a time (so 128 bits per iteration on 64b platforms) until it finds a non-ascii byte: https://doc.rust-lang.org/src/core/str/mod.rs.html#1541