|
|
|
|
|
by n1ghtm4n
4723 days ago
|
|
Why not a 2-word struct with the rune count instead of the byte count? There's zero performance cost for many strings because the rune count is known at compile time. For the rest, most strings are too short for Big-O analysis to be relevant and I would guess (enlighten me if I'm wrong) that the cost of computing the bounds of each character is negligible on a modern processor. Multi-byte chars in a string are going to be adjacent in memory, adjacent in cache, and therefore trivial for today's not-at-all-instruction-bound CPUs. Again, correct me if I'm wrong. |
|
Consider a simple string: "école". How many runes does it contain? Possibly five:
Possibly six: If you normalize the string you can guarantee you have the first form, but not every glyph can be represented as a single rune.Fortunately, you generally don't need to deal with any of this. If you're working with filenames, for example, you really only care about the path separator ('/' or '\' or whatever); everything else is just a bunch of opaque data. You can write a perfectly valid function to split a filename into components without understanding anything about combining characters. When you're dealing with data in this fashion, you rarely if ever care about the number of runes in a string; instead you care about the position of specific runes.