Hacker News new | ask | show | jobs
by quincunx 1894 days ago
Thanks. It's not that UTF-8 is not on the list, it's always been on the list, it's just not there yet. Hence I felt the need to stipulate the lack of it in the manual, because of its importance.

If you're so inclined, examine rex.c, and you'll see (e.g. rex_nfa_make_ranged_trans() for example) that the engine internally works with ranges of uint32 for this very unicode reason.

The front-end regex parser and driver code, however, are not there yet, so prior to code emission, these beautiful ranges of uint32 codepoints are back-translated into rote uint8 tables. Such is the fate of wanting to ship. It'll come.