Hacker News new | ask | show | jobs
by trw999 2625 days ago
https://en.m.wikipedia.org/wiki/Four-Corner_Method

You can sort Chinese characters (including Kanji but i'm not sure they use the Four Corners Method) by the Four Corners method. Why would you need to sort kanji phonetically in the first place? Do Japanese users actually expect names to be sorted phonetically? English speakers don't expect names to be sorted by IPA so consistency of the sorting scheme should be all that matters.

6 comments

Kanji in Japanese are sorted by writing them out phonetically in Kana and then sorting by the kana.

English speakers don't expect names to be sorted by IPA because most English speakers don't know IPA, but all Japanese speakers do know the kana.

The fact that they are sorted as-if written in kana means that the exact same written character will be sorted differently according to context. For "not names" we actually do have fairly good algorithms for determining the pronunciation of Kanji, but for many names there is literally no way to know how to pronounce them without asking.

I actually made a serious attempt at learning the Four-Corner Method for kanji [0] and it was very frustrating. It would be difficult to determine what parts of the kanji belonged to which corner, and which exact shape they corresponded to. And strokes wouldn't always be interpreted the way I thought they'd be since it's based on a handwritten representation of the character. Many characters also have multiple FC numbers! The FC method was never meant to uniquely identify specific characters, but just to help narrow down a list of candidates in a dictionary. Funnily enough, I also argue something similar in this thread that has the same drawbacks :)

[0]: Because I was interested in typing characters while not actually knowing the kanji. The Tagaini Jisho app (https://www.tagaini.net/) was indispensable because it lets you search on multiple parameters including partial FC # and simpler methods like SKIP codes (http://nihongo.monash.edu/SKIP.html). The only characters I couldn't transcribe with this method were those printed so small that the individual strokes were difficult to make out.

Don't know about Japanese, but other well-known input schemes for Chinese includes Cangjie (https://en.wikipedia.org/wiki/Cangjie_input_method), Zhengma (https://en.wikipedia.org/wiki/Zhengma_method), and Wubi (https://en.wikipedia.org/wiki/Wubi_method).

In fact, all of these non-phonetic input/encoding systems are highly non-intuitive and have a reputation of sharp learning curves, frustration is expected. This is because, in general, Chinese or Kanji characters are expected to be pronounced or written by the speakers, not to be indexed in a particular encoding system. Only the pronunciation is the natural form in the language.

The encoding schemes are completely foreign, arbitrary to the native speakers. Using them requires extensive and systematic training. In mainland China, Hong Kong and Taiwan, in the 80-90s, learning to use a computer often starts from learning the code, and it needs at least two months of mechanical memorization to get started, and years of use to master it, just like how amateur radio operators learn Morse Code (edit: well, you don't need to memorize the code for every single character as if it's Morse Code, but remembering the standard decomposition of characters in the system is comparable to rememebr the Morse Code table). And remember, these are native speakers, much greater effort is needed for foreign speakers.

Sure, schemes based on radicals have been used for a thousand years in dictionaries, but all of these schemes used today are a completely artificial creation for typing and searching things into/from computers (often on those with very limited computing power).

The increase of processing power of personal computers in the late 90s allowed phonetic input systems to map pronunciation to characters heuristically, with high correctness rate. So those codes are rarely used by Chinese, and Japanese speakers (I believe) today.

Unless you've learned computing in the during 80s to mid 90s, or you have a job related to language or word processing that requires typing tens of thousands of characters or creating/searching them in a language-related database, or you are someone who emphasize typing efficiency.

> Because I was interested in typing characters while not actually knowing the kanji.

This is actually a common requirement for people with those jobs, and one of the biggest reason to keep using them. It is especially useful when transcribing texts to computers or searching them in a database.

Users also argue that, using them help preventing the modern disease of forgetting the writing of characters due to computerization, which I do see a point, similar to the spell-checker problem in English education.

Phonetic systems for Chinese aren't so prolific for non-mandarin speakers.

9 square/Q9 is another popular method.

The stroke-based methods aren't so arbitrary - they are based on the way you write.

> The stroke-based methods aren't so arbitrary - they are based on the way you write.

Indeed, they are not arbitrary.

The very invention of them is meant to create something much more meaningful to the users as alternative to the telegraph code and alike, which is nothing more than a bunch of numbers. But I say "arbitrary", in the sense that the classification and organization of strokes in the system is selected artificially by the designers, not something inherently exists in the language and understood by the speakers, difficulties are overcame once you are familiar to the system. But extensive effort is needed for a new user to master the system.

Based on extremely limited Googling, one of the cases where these codes are still used is written colloquial Cantonese, which lacks any major official support.
Good point.

Some forms of Cantonese romanization (https://en.wikipedia.org/wiki/Hong_Kong_Government_Cantonese...) does exist, but their use is limited to mainly language study and transliteration. And apparently, there are multiple projects to create phonetic input systems for Cantonese (see the reference section of this Cantonese Wikipedia article, https://zh-yue.wikipedia.org/wiki/%E7%B2%B5%E8%AA%9E%E6%8B%B...), but all with very limited standardization and official supports.

As someone who has attempted to learn to get back to my roots, I can definitely say the lack of a standard romanization is a huge barrier. Cantonese textbooks are not interchangeable because each series decides which romanization to use and some even take the liberty of creating their own.

The only people that could sort this out is a government.

- Guangdong won’t, because official Party policy at best discourages use of regional languages. There was a lot of fury back when they attempted to stop provincial broadcasts in Cantonese.

- Hong Kong won’t, because the Government likes to fumble around a lot these days, and because they have a more or less unofficial goal to integrate into China. Cantonese is not an official language, only “Chinese”.

- Macau won’t. They’re too busy trying to sweeten up China to let in more gambling tourists.

The other issue is that there are now some phonetic changes between the mainland and Hong Kong, the two places people would consider authoritative on the subject.

>Do Japanese users actually expect names to be sorted phonetically?

Yes. Usually when there's a list of things you'd see subsections like あ (words that start with a vowel), か (words that start with "k" or "g"), さ (words that start with "s" or "z") and so on, and there's a specific order within each subsection.

Imagine having your contact list sorted by some geometric function run on each letter of the A-Z alphabet.
That would be marginally better than memorizing ABCDEFGHIJKMNOPQRSTUVWXYZ. Is there anything inherent about the letter A that makes it get sorted in front of the letter Z?
> Do Japanese users actually expect names to be sorted phonetically?

Yes - or that's how a human would sort them, at any rate.

Think about numbers, it is not sorted phonetically, Even the alphabet is not. All those languages that use alphabet sort in exactly the same order regardless the pronunciation.
I don't think you're in a position to speak for all of humanity.
I meant a human as opposed to an algorithm.

Point being: kanji words aren't always sorted phonetically, for the reasons described in the article, so a user may not be surprised if they aren't. But when a human is sorting kanji words they do so phonetically by reading.

East-Asians usually put the last name before the first name, and most Japanese last name pronounced different, so there is no problem to them phonetically.