| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Piskvorrr 413 days ago
	Once you start taxatively naming "these are the Only Blessed Ranges," you'll be bitten by the usual brouhaha "email address ends with .[a-z]{2,3}". We all know how it went, and ".[a-z]{2,4}" didn't cut it, either, not even in 2000.

1 comments

nuc1e0n 413 days ago

To add to the complexity, not all Chinese characters in use for names are representable in unicode. Perhaps at some point legal institutions must just define what the list of characters is that people can have as part of their name as listed on documentation. This reminds me of that 'what programmers believe about names' article from a while back.

link

pepa65 412 days ago

If so, I think they would just need to be added to Unicode. Do you have an estimate how many are missing?

link

bmn__ 412 days ago

I as an interested bystander estimate it in the order of 10⁵. Email Ken Lunde for better insights.

Note that GP claimed "not representable" (not "not represented"). Based on what I know, that claim feels quite wrong.

link

bmn__ 413 days ago

> not all Chinese characters in use for names are representable in unicode

Why? How do you come to this conclusion?

link

SAI_Peregrinus 413 days ago

Han unification[1] prevents the representation of all Chinese characters. There are multiple languages that use Chinese characters, but they don't all use the same characters. Unicode decided to only use Han Chinese characters, so names using other sorts of Chinese characters can't be written with Unicode. The Han "equivalent" characters can be used, but that looks weird.

Think of it as though Unicode decided that the letter "m" wasn't needed to write English text, since you can just write "rn" and it'll be close enough. Someone named "James" might want to have their name spelled correctly instead of "Jarnes", but that wouldn't be possible. Han unification did essentially this.

[1] https://en.wikipedia.org/wiki/Han_unification

link

bmn__ 413 days ago

I feel it's unlikely that this the explanation for what GGP had in mind. I postulate that names characters usually have no variants, thus do not undergo unification, or where there are variants, they are already encoded as Z variants, so the contention is also moot.

Prove me wrong with a counter-example.

link

SAI_Peregrinus 412 days ago

https://soranews24.com/2014/02/13/japanese-woman-celebrates-...

First search result.

link

bmn__ 412 days ago

𫟈 is U+2B7C8 "CJK Unified Ideograph- 2B7C8". 𛁻 is U+1B07B "Hentaigana Letter To-5".

Both character fall into the first category I mentioned, no variants.

link