Hacker News new | ask | show | jobs
by lifthrasiir 3145 days ago
It is a brain-dead solution: any Unicode scalar value not matching /[\u0000-\u10ff\u2000-\u200d\u2010-\u201f\u2032-\u2037]/ doubles the cost. [1] The primary range ends at U+10FF because it conveniently excludes virtually all CJK characters (Hangul starts at U+1100) with relatively low error rates. Yet, it's still brain-dead.

[1] https://twitter.com/FakeUnicode/status/928030981805588480 (used to be /[\u0000-\u10ff]/ during the test period)