|
|
|
|
|
by ssokolow
991 days ago
|
|
*nod* Rust was given as one of the examples and Rust's .len() behaviour is chosen based on three very reasonable concerns: 1. They want the String type to be available to embedded use-cases, where it's not reasonable to require the embedding of the quite large unicode tables needed to identify grapheme boundaries. (String is defined in the `alloc` module, which you can use in addition to `core` if your target has a heap allocator. It's just re-exported via `std`.) 2. They have a policy of not baking stuff that is defined by politics/fiat (eg. unicode codepoint assignments) into stuff that requires a compiler update to change. (Which is also why the standard library has no timezone handling.) 3. People need a convenient way to know how much memory/disk space to allocate to store a string verbatim. (Rust's `String` is just a newtype wrapper around `Vec<u8>` with restricted construction and added helper functions.) That's why .len() counts bytes in Rust. Just like with timezone definitions, Rust has a de facto standard place to find a grapheme-wise iterator... the unicode-segmentation crate. |
|