Hacker News new | ask | show | jobs
by neild 4723 days ago
It is very seldom that you really want to deal with a string as an array of runes. (If actually you do want to, Go makes it fairly easy: Just use []rune rather than string.)

Consider a simple string: "école". How many runes does it contain? Possibly five:

    LATIN SMALL LETTER E WITH ACUTE
    LATIN SMALL LETTER C
    LATIN SMALL LETTER O
    LATIN SMALL LETTER L
    LATIN SMALL LEtTER E
Possibly six:

    LATIN SMALL LETTER E
    COMBINING ACUTE ACCENT
    LATIN SMALL LETTER C
    LATIN SMALL LETTER O
    LATIN SMALL LETTER L
    LATIN SMALL LEtTER E
If you normalize the string you can guarantee you have the first form, but not every glyph can be represented as a single rune.

Fortunately, you generally don't need to deal with any of this. If you're working with filenames, for example, you really only care about the path separator ('/' or '\' or whatever); everything else is just a bunch of opaque data. You can write a perfectly valid function to split a filename into components without understanding anything about combining characters. When you're dealing with data in this fashion, you rarely if ever care about the number of runes in a string; instead you care about the position of specific runes.

1 comments

Thank you for the explanation! Converting to a rune slice and back does give me the behavior that I wanted. It still looks butt ugly to me, but at least it works.

In Go:

    fmt.Printf("%s", string([]rune("нєℓℓσ")[1:4]))
    // єℓℓ
In Python:

    print("нєℓℓσ"[1:4])
    # єℓℓ