Hacker News new | ask | show | jobs
by lancestout 4752 days ago
The decision to only support Unicode 3.2 is simply because the StringPrep framework [1] (which XMPP's nodeprep and various other protocols use) is forever tied to that version of Unicode.

Current work is on the PRECIS framework [2] which uses the metadata for Unicode code points to determine how to handle them during canonicalization instead of relying on a hard coded set of mapping tables. There's still a lot of work to be done, mainly to review that the process works reliably and doesn't introduce subtle new issues. Peter Saint-Andre (one of the authors of PRECIS) has just started on a Python tool for testing how a given version of Unicode is handled by PRECIS (https://github.com/stpeter/PrecisMaker).

[1] https://www.ietf.org/rfc/rfc3454.txt

[2] https://tools.ietf.org/html/draft-ietf-precis-framework-08

1 comments

Great information, even cooler to see someone's got some code working alongside it, I might have to adapt it to Go if it's fairly reasonable to understand given my relative lack of experience with Unicode (heh as I'd mentioned and is probably obvious given your knowledge on hand).