Hacker News new | ask | show | jobs
by Someone 2716 days ago
I think the rust designers want to keep the implicit contract that indexing into a string is fast and O(1).

If you want to find the one millionth codepoint of a UTF8-encoded string, you have to more or less (1) visit every byte of the string.

If, on the other hand, you want to find the codepoint that covers the millionth byte, on the other hand, you have to read at most four bytes (read the millionth byte, and there are three cases:

- it’s a full codepoint. If so, you‘re done.

- it is the first byte of a multi-byte codepoint. If so, read forwards in the string for up to 3 continuation characters.

- it is a continuation character. If so, search backwards in the string for the first byte, then, if necessary, read forwards to find more continuation characters.

So, that is O(1)

(1) you can skip continuation characters, but these typically are rare.