Hacker News new | ask | show | jobs
by lizmat 762 days ago
> It builds a lookup table of grapheme clusters, and represents them in memory as negative i32s.

Only for those grapheme clusters that do not have a representation in Unicode!

Also, these negative i32s are really an implementation detail.

> but O(1) access to each grapheme is less important

Unless you want regexes to be a. correct in the unicode world, and b. be performant

1 comments

> Only for those grapheme clusters that do not have a representation in Unicode!

I think it's reasonable to consider a grapheme cluster composing one codepoint to be a codepoint, not a grapheme cluster. One grape is not a cluster of grapes.

> Also, these negative i32s are really an implementation detail.

What a coincidence! I was explicitly discussing the implementation.

Oh, is this the thing where some people pretend that Raku is different from Rakudo? Fine. Pretend I said Rakudo.

> Unless you want regexes to be a. correct in the unicode world, and b. be performant

I work extensively on low-level pattern matching code. So I can say with considerable confidence that blowing up every string to take up four bytes per codepoint or grapheme cluster, is not the only way to make regex correct in the unicode world, nor is it necessarily the best, or even helpful. The assertion that a regex search on a blown-up and custom-tailored string is going to be more performant than performing that search on the native UTF-8 representation of the string, is hard to justify. It's seems evident to me that it would be less so, by default.

Furthermore, I'm unsure how O(1) access to anything could aid regexen, since using them is O(n) by definition.

I think Raku is an interesting language and that people should check it out, to be clear. That doesn't mean I agree with every choice the Rakudo implementation has made.

> I think Raku is an interesting language and that people should check it out, to be clear

I agree :-)

> That doesn't mean I agree with every choice the Rakudo implementation has made

Indeed. Some of these choices have their roots in the late 1990's / early 2000's. Some of them make less sense now than they did then. FWIW, these are continuously evaluated by the current core team, to continue to improve Rakudo.