| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by celoyd 5696 days ago

I understand the need for localization and all, but 46 THOUSAND characters? Jeez.

There are that many Han characters alone, so I’m not sure what the surprise is. It’s not like you have to hard-code them in your grammar.

If anything, I’d hope that new languages in 2010 allow any of the roughly 100,000 non-control non-whitespace [edit: non-punctuation] Unicode characters. For a lot of the code I see, ASCII is at least as constraining as, say, fixnums would be.

2 comments

jere_jones 5696 days ago

Actually, it is possible it does. I only ran the test from 0x0000 to 0xFFFF. Maybe I should revisit.

link

riffraff 5696 days ago

moreover you actually already have trivial code for that in $JAVAISH_IMPLEMENTATIon's Character.isUnicodeIdentifier{start,part}

link

jere_jones 5696 days ago

I used Character.isJavaIdentifierStart(int) and Character.isJavaIdentifierPart(int) to write a file with the ranges that I cut and pasted into my C# code. And thank goodness, too! I'm sure I would've made a typo or missed characters if I had to type all that myself.

link