|
|
|
|
|
by jstimpfle
3206 days ago
|
|
> I get that the idea was to maintain indexing via codepoint, but (again) in practice that's not great: usually you want to index via grapheme -- if you want to index at all. I definitely need indexes, and I don't really care about graphemes. I actually have only a vague idea what that is. I write parsers typically by using a global string and lots of indices.
The important thing for me is to be able to extract characters and slices at given positions, and to be able to say "parse error at line X character Y" where X and Y are helpful to the user most of the time. I would be absolutely fine with working in UTF-8 bytes only (and that would be faster I guess), but there would be a more pressing need to recompute character positions (as a code point or grapheme index) from byte offsets at times. There are more abstract parsing methods where parser subroutines are implemented in a position agnostic way, but I'm very happy with my simple method. If everything works on graphemes instead of code points (as I think does Perl6) I will be happy to use that, but it's not so important from a practical standpoint. |
|