For the same reason you want to index anything: to slice, remove, etc. stuff. e.g. to replace a skin tone in an emoji: "str[i] = 0x1f3ff", or to insert one: "str = str[:i] + 0x1f3ff + str[i:]".
But that's a pointlessly inefficient way to do it - surely what you want there is to iterate and transform rather than scan through and then slice? (And don't you need to group by extended grapheme cluster rather than codepoint anyway for that to make sense?)