Hacker News new | ask | show | jobs
by truthexposer 3358 days ago
I've seen this sentiment a lot in Chinese-Americans that are not educated in linguistics, along with other self-loathing sentiments.

First, literacy isn't completely related to the writing system. Look at Spanish speaking countries, where the alphabet is more phonetic than the English alphabet.

Second, Chinese characters that are more complex, i.e. consists of more than one radical, are usually composed with a semantic component, giving indication to the character's meaning, and a phonetic component, which gives an indication to the sound of the character. Although this isn't a rule, it helps a lot, and it's not like English doesn't have crazy non-phonetic spellings as well (how tf is "through" supposed to be pronounced for a English learner?)

Last, the Chinese language consists of MANY homophones. This isn't necessarily a bad thing, and not one out of design, but something that is the result of being one of the oldest language families in the world. It allows for the concise expression of many things using only single syllables. You might say, but what about the crazy amount of ambiguity if the language has a lot of homophones? Well, ambiguity is a huge problem in all languages and our brains seem to manage. Now, even though homophones aren't a big problem in spoken language, because of intonation and prosody giving a clue to how to analyze sentences, written language is a different story, and it would be very hard to make an easier system to handle it. For all you engineers, the fact that Chinese has characters is essentially a performance trade off. More information density for more ambiguity.

10 comments

It's actually a myth that most Chinese characters have a semantic component (indicating meaning).

And the phonetic component often doesn't correspond to anything any modern person would know.

The problem in both cases is the shift of language through China's long history, and its divergence from the original design of the written characters.

The problem with the semantic components is that the meaning is re-used and stretched over and over. E.g. they used to use a 'foot' radical to represent a journey towards a destination. Later that shifted to mean "in a straight line, not veering left or right". Later that took on the meaning of "straight and narrow" or "straight shooter" or "not-deviant". Then it becomes, "not deviating from the right path". So now, the old foot radical typically means "justice" or "correct".

And the image shifts over time. This is the old foot radical now: 正 Does it look like a foot to you?

The problem is with the phonetics is language shifts. In many cases, in ancient Chinese, the characters do have a phonetic component that hints at pronunciation. but, the pronunciation changed over the last 2000-3000 years, so the pronunciation hint that made perfect sense in the Han dynasty is now meaningless because you're speaking a different language.

The result is that in the end the characters end up being arbitrary phonetic symbols with some arbitrary meanings attached.

Practically speaking, things are more nuanced than the article would imply, though - they're not 3000 completely unique characters, and at least for a lot of nouns, the radicals do serve to broadly categorize things as, e.g. "related to water", which can serve as a reading aid similar to how root words do. The author further claims "With a phonetic writing system like an alphabet or a syllabary, you need only learn a few dozen symbols and you can read most everything printed in a newspaper.", but that's only accurate insofar as reading means "pronouncing"; it's a tradeoff with being able to infer meaning.
>It's actually a myth that most Chinese characters have a semantic component (indicating meaning).

That's because most people only know about the 2000 Chinese characters used in everyday situations, so they have this misunderstanding; Others who don't know Chinese heard this and keep on parroting it. If you know more about the language (post-seconadary level), you would know Chinese relies heavily on semantic component.

Here is an example characters I saw in Stanley, Hong Kong a few days ago (as part of a 2 lines poem.)

巍峩 vs the normal form 魏我

Normally the character 魏 means "Tower on an emperor's building" and 我 means "I". By adding the component 山 ("Mountain") to the characters, the words 巍峩 now carries a connotation of epicness one associate with mountain range. ie. "The building and I, as impressive as a mountain range."

I'm a little confused by your comment. Are you saying that Chinese writing relies on a semantic component, but that this component is indecipherable to the vast majority of readers of Chinese? That does not sound very useful, especially when it comes to literacy.
>Are you saying that Chinese writing relies on a semantic component

I am saying _advance_ Chinese writing has a significant semantic component, but most speakers are not knowledgeable enough to realize it.

>That does not sound very useful, especially when it comes to literacy.

Most people can speak English; You probably need an advance English degree to understand all the puns/word plays in Shakespeare's works (or use Cole's Notes.)

The flexible of a language's intricate details has no direct relationship with the literacy of its speakers(or how easy to become proficient at an language to the point where one can communicate compound ideas with it.)

Back to how useful all these are- not really if you stick with the basic. They are as useful as Shakespeare's or Wordsworth's works in the modern world. You could speak/communicate perfectly fine in English even if you are not able to understand advance details of English literature, especially the vocab side.

Yes and no, usually it's enough to allow for broadly guessing at meanings[1]. Contrast that with the Roman alphabet - I am a native speaker of English, and if you hand me a book written in German, I can sound it out (to some extent), but without knowing what it means. Reading written Chinese, you may be able to guess at what a line means, but without any chance of sounding it out! (Disclaimer, I studied one year of college-level Mandarin, so I would have a slight chance, at least...). Of course, it's interesting also to think about the impact computing and smartphones are having on written Chinese, stroke order of characters, for example was less important when using a keyboard for input, but if you're drawing on a touchscreen, it comes back into play.

1: https://en.wikipedia.org/wiki/Radical_(Chinese_characters)#S...

> I am a native speaker of English, and if you hand me a book written in German, I can sound it out (to some extent), but without knowing what it means.

That works well because English and German are very similar. If China moved from characters to Pinyin, someone who doesn't know Chinese could sound it out, but it would be mostly unintelligible (wrong initials, no tones).

Thanks for sharing this contrarian perspective that most Chinese characters lack a genuine semantic component.

I don't know if the data supports this or not, but it's an interesting thought.

Regarding the example of 正, Wikipedia suggests the foot radical is actually 足/⻊(radical 157) [0].

It suggests 正 is not a radical but rather derives from radical 77, "stop (止)."

When you add the top line to form 正, the image meaning becomes "stop in the middle," which seems reasonably aligned with the meaning of "right/justice/normal."

Not sure who's right: Wikipedia or you, so please clarify if Wikipedia is wrong. Thanks!

[0] https://en.wikipedia.org/wiki/Kangxi_radical

it's hard to quantify as most? (i don't even know maybe 1000 or so characters) but a lot of them do have semantic component. Sometimes it's a bit obfuscated because simplification (either the pinyin instituted by the PRC or other simplifications, like jyoyo instituted by the japanese in the 19th? century) destroyed semantic meaning by fusing roots, or totally deprecating characters and replacing with unrelated characters that sound the same.
exactly, so not sure whether the original assertion (i.e., most characters lack semantic meaning) is correct, but it's an interesting thought to consider.

someone could prove/disprove this by analyzing the 3000 most common chars and indicating what percentage contain a semantic component (even if obfuscated).

do you know if wikipedia or the parent is correct about 正 as the radical for foot?

Wikipedia is correct. Radicals do carry a lot of semantic meanings. I think the most apparent examples are the names for chemical elements. For example, the radical 气 means gas. Hellium in Chinese is 氦,hydrogen is 氢. So look at the character you immediately know the natural state of the element. Moreover, even if you don't know a character, if you see it has the radical, you can guess what that is related.
It's a myth insomuch that it doesn't apply 100% to all characters.

It works well enough to guess the pronunciation of a character you've never seen. Or maybe a word you know how to say but don't know how to write.

You're speaking out of your behind. Of course the signs are arbitrary with respect to their signifiers, that's any language, but there are still mappings between signs and their signifiers. English is no difference, am I supposed to know what a 'm' sounds like by looking at it? Look up the Rebus principle, it happens in every language.

And with respect to the phonetic shifts, from a linguistic perspective, most changes of the phonetic radical involve one change of the initial sound, i.e. bilabial to interdental, dental to alveolar, voiced to unvoiced. With respect to the phonetic inventory of the language, these shifts are only one feature shifts within an old SPE framework. These small featural differences, whether conscious or not, (usually unconscious because we acquire languages during our infancy) are picked up and used by the speaker of the language to categorize words.

You're speaking out of your behind.

Your detailed and substantive comment works just as well without this uncivil lead-in.

it's a much easier way to say, I significantly doubt your credentials as a person educated in both Mandarin Chinese and linguistics
HN strongly values substantive and civil discourse. If that requires a little more work on your part, please engage in it to help make HN a productive forum where discussions like this can take place. (That said, I don't see it necessary for the longer version either.)
I agree. I can only recognize on the order of 1,000 characters or so, but it is still obviously not a myth that characters have semantic and phonetic components. I have no idea what the grandparent commenter is talking about.
Historically, there have been repeated attempts in both China and Japan to replace Chinese characters with a phonetic system (and Korea and Vietnam actually went through with it). These efforts haven't panned out, but it's demonstrably not the case that only ignorant émigrés think a phonetic system would be better.
Well, the history of Hangul shows that the difficulty of phoneticizing a language is on the first problem in the disseminating a phonetic alphabet.

"Hangul faced opposition by the literary elite, such as Choe Manri and other Korean Confucian scholars in the 1440s, who believed hanja to be the only legitimate writing system, and perhaps saw Hangul as a threat to their status.[9] However, it entered popular culture as Sejong had intended, being used especially by women and writers of popular fiction.[15] It was effective enough at disseminating information among the uneducated that Yeonsangun, the paranoid tenth king, forbade the study or use of Hangul and banned Hangul documents in 1504,[16] and King Jungjong abolished the Ministry of Eonmun (언문청 諺文廳, governmental institution related to Hangul research) in 1506."

https://en.wikipedia.org/wiki/Hangul#History

Pinyin Chinese is an international standard and keyboards would be basically unusable for writing Chinese characters without it: https://en.wikipedia.org/wiki/Pinyin.
> Pinyin Chinese is an international standard

A national standard, at best (in the PRC). In Taiwan, Zhuyin (Bopomofo) is universal, taught in schools and found on laptop keyboards, and romanisations are mostly following Wade-Giles rules. In Hongkong, Eitel romanisation is used even though their new commie overlords are forcing pinyin and yale canto down their kids' throats.

ISO and United Nations adoption don't qualify Pinyin as an international standard?

Using Taiwan as an example isn't really relevant given that it's majority Cantonese speaking, and Cantonese is a completely different spoken language than Mandarin.

Edit: I'm a clown. I was confusing traditional Chinese with Cantonese. Not the same thing.

Hong Kong is majority Cantonese speaking. Taiwan is not, it's Mandarin (and to a lesser extent Hokkien and Hakka but most young people prefer to speak Mandarin).
You're right, edited my original comment to correct. Thanks.
Are you sure Taiwan is mostly cantonese?
You're right it's not. I was confusing it with traditional vs simplified Chinese. Thanks for the correction.
Besides Pinyin, there is Cangjie, Q9, Dayi, etc. Plenty of other input methods are well in use. Though Pinyin is most widespread due to it being the way most kids are taught Mandarine.
There's lots of alternatives to Pinyin, some based on the Latin alphabet, others not like Bopomofo.
Not to mention any of the shape based methods like congkit
Which is supremely confusing as a student of history of that region of the world. Trying to figure out whether a book or article used Pinyin or Wade-Giles, one of the other competing systems, a mishmash of several, or (especially in older sources) made up their own method, made it very interesting trying to keep track of where things were happening and who was doing them.
I don't think there was ever a 'solid attempt' in Japanese to do a strictly phonetic system. The language has such a high degree of homophony that assigning chinese characters in the written system does allow some level of useful disambiguation.

It gets really complicated with japanese - there's a many-to-many-to-many <symbolic>-<phonetic>-<semantic> relationship... One character can have many different readings and due to the homophony, any given reading can have divergent meanings. For example - HASHI can mean bridge, "side of something", or chopsticks - and the character for "bridge" can also be read "kyo" depending on the context.

I imagine chinese is much easier for a japanese native speaker to learn than vice versa for those reasons (since often some of the phonetic forms "on-yomi" of the characters are directly borrowed - with some inflective changes).

> ... something that is the result of being one of the oldest language families in the world.

That makes no sense. If we had a time machine, we'll be able to trace every modern language back to some African tribe that became the ancestor of all modern humans.

Perhaps you meant having a literary tradition that's among the oldest in the world, but even that doesn't necessarily explain why there would be so many homophones. Languages lose words all the time, even one with great literary traditions.

I meant the development of written languages happened independently across different regions of the world, and of course at some point, languages became mutually unintelligible, and language changes within the languages happened differently.

And while of course every language descends from another, and ancient Chinese isn't understandable to a modern Chinese speaker, there is a long tradition within the Chinese language, and change within the language hasn't been shaped by things as drastic as displacement of the linguistic community, or forced political rule under a foreign power, etc.

> it's not like English doesn't have crazy non-phonetic spellings as well (how tf is "through" supposed to be pronounced for a English learner?)

English is probably the worst case among alphabetic languages, but you can still usually guess the exact pronunciation of an unfamiliar word.

> the Chinese language consists of MANY homophones.

But still Chinese speakers make themselves understood without difficulty, even over the telephone. Most Chinese words are polysyllabic. Taking that into account, homophones are rarer than you might think.

There's a language+, Dungan (https://en.wikipedia.org/wiki/Dungan_language), which is to some extent mutually intelligible with Mandarin. It's written entirely in the Cyrillic alphabet without diacritics to indicate tone. Sample text: http://www.omniglot.com/babel/dungan.htm

+ You could call it a dialect, but it has its own script.

> I've seen this sentiment a lot in Chinese-Americans that are not educated in linguistics, along with other self-loathing sentiments.

FWIW, I think Ted Chiang (author of Story of Your Life that Arrival was based on) knows a decent amount on linguistics.

At this point, Chinese characters aren't going to go anyhere. But I think it's an interesting thought experiment to puzzle through.

No, he has opinions on language, not knowledge of linguistics. He is a science fiction writer. It's an actual academic discipline, you know, not just people harumphing in armchairs.

The Sapir-Whorf hypothesis, which underlies Arrival, has been debunked for years in academia, and the movie, if detached from that hypothesis, is more of a thought experiment on the effect of the linearity of time on human thought than a thought experiment on anything related to language itself.

> The Sapir-Whorf hypothesis...has been debunked for years in academia...

If even the weak form Sapir-Whorf hypothesis (in restricted contexts) is debunked, then I'd like to read up on it. Where can I look into that because a Google by this layperson doesn't turn up anything on it? Thanks in advance.

the weak form isn't debunked, but the hypothesis as a whole is debunked to a degree of being common sense, of course psychology, sociology, and anthropology may argue for stronger forms to advance their own theories, but most theories from any discipline that studies this sort of thing would agree that the influence cognition and language have on each other is not one-directional, nor are the influences very direct, and the influence that they do display on each other may be caused by some other phenomenon.

tell the eskimo seeing X different kinds of snow to any linguist and they will roll their eyes, debating whether or not to explain why the example is probably not only incorrect, but fabricated for the umpteenth time.

Thank you for the concise explanation. I'm curious about the weak form because our own field's "Blub Paradox" seems to loosely allude to the weak form (though there are arguments against drawing such an analogy [1]). If even the weak form in limited contexts is discredited, then that is one less possible explanation for the Blub Paradox (and alas, one less possible set of remedies to correct the paradox). I suspect that the weak form is a poorly-understood expression of a cognitive mechanism (perhaps with application to AGI?) we don't clearly grasp yet; some irony there.

[1] https://news.ycombinator.com/item?id=61157

That's a pretty arrogant statement. Can a science fiction writer not have knowledge of linguistics?
no, but I doubt that if I ask him what X-bar theory is, that he'll have any clue what I'm talking about, and that he would even think I'm talking about a theory of generative syntax.
you make a lot of assumptions.
>The Sapir-Whorf hypothesis, which underlies Arrival, has been debunked for years in academia

Well the "academia" has been debunked for even longer -- especially when it comes to soft sciences, so that doesn't say much.

I agree with you. The author even admits right off the bad that he failed to learn the language. I don't understand the mindset of someone who would write an article critiquing Chinese characters, knowing so little about them.

If the author can be taken this seriously given his self-professed ignorance, I am probably much more qualified than him to speak about Chinese characters.

There are two points worth disambiguating here. One is whether Chinese characters hinder the literacy of native speakers and the other is if it hinders that for learners. Neither I nor the author have any authority to speak about the former.

As a heritage speaker, I actually only recently learned Chinese to a level where I consider myself literate. My experience was that the characters were not an obstacle, like the author suggests, but an indispensable tool for rapidly learning the language.

To learn any language, it is unavoidable that you need to memorize thousands of new words. Memorizing a Chinese character is not much harder than memorizing a word. However, the magic of Chinese characters comes when you combine them to form actual words.

The vast majority of Chinese words consists of 2 characters, but because each character also encodes meaning, you can more often than not guess the meaning of a word you have never seen before.

Although you can take advantage of common roots for words in other languages, the scale is simply incomparable. For other languages, you pretty much have to memorize every new word you see.

My experience has been that learning Chinese characters may be a higher upfront cost (although I disagree), once you learn enough, you rapidly understand way more vocabulary because characters themselves encode semantic information.

Perhaps it's simply the lack of any cognates between English and Chinese, and a tinge of cultural supremacy, that make people, like the author, think it's Chinese characters that's the root of everything that makes the language difficult to learn.

The fact that Chinese characters hinder literacy is frequently discussed by linguist and Sinologist Victor Mair on Language Log: http://languagelog.ldc.upenn.edu/nll/?cat=18

Your reaction would be reasonable if someone had claimed that a particular spoken language was hard to learn. Spoken languages are all equally easy, because they are all learned by children in a few years.

But written languages can be arbitrarily difficult, and in Chinese, people with post-graduate educations will often forget how to write common words. This is not cultural supremacy. This is linguistics.

... a lot in Chinese-Americans that are not educated in linguistics, along with other self-loathing sentiments

The world's languages, of course, developed nicely without intervention from linguistics, a latter-day mostly retrospective discipline.

> First, literacy isn't completely related to the writing system. Look at Spanish speaking countries, where the alphabet is more phonetic than the English alphabet.

Perhaps more to the point, Japan, which uses Chinese characters for nouns, adjectives, adverbs, and verb-roots, has a very high literacy rate. [1]*

[1] https://nces.ed.gov/pubs2014/2014008.pdf

* Not trying to cherry-pick for Japan being at the top, but Japan's not listed in the 2015 UNESCO report that's referenced all over the place. This US DoE report from 2013 seems legitimate enough.

> Perhaps more to the point, Japan, which uses Chinese characters for nouns, adjectives, adverbs, and verb-roots, has a very high literacy rate.

While literacy might be high, there's been a stream of articles over the years in Japanese media noting that literacy in kanji (the component of Japanese based on Chinese characters) in particular has been steadily declining, even among the highly educated; e.g. http://www.japantimes.co.jp/news/2013/07/03/national/kanji-w...

That's a fair point - seems related to the fact that people aren't writing as much anymore.

Anecdotally, I feel like Japanese picks a better point in the tradeoff space than other languages - Chinese characters are used where they get some leverage, with phonetics for everything else, even with a bunch of half-kanji words. Still, I'd imagine some of the less-commonly used ones will get dropped over time.

On the other hand, English+Emoji+Txt simplification seems to be approaching the same point from the other direction.

English needs to reform out written language as well. It isn't as bad as Chinese, but there is a reason "only two languages commonly have writing competitions: Chinese and English." (I'm not sure who to attribute this to, it isn't mine though I probably got it wrong)

Spanish adults read at a 5th grade level, English adults read at a 6th grade level, Japanese adults read at a 9th grade level - this isn't a reflection of education it is a reflection on how difficult the written language is to learn.

As a bad speller with a passing knowledge of Spanish I'm jealous: when I hear a Spanish word I can spell it, with English I have no clue.

Note that even in Spanish there is room for confusion (source: I'm Spanish), as in some cases it's very simple to write homophones (like vaca / baca, both valid, not related at all.) My father, who always told me he had very bad orthography, always asked me if that word started with h, or if that other was written with b / v, or g / j.
That's certainly the exception and not the rule
How does Chinese handle the propagation of neologisms?

A rough 1:1 phonetic correspondence between oral and written languages intuitively (to me at least) would be more fit for the creation and propagation of news words, or am I missing some other natural form of conveyance and propagation that exists in the Chinese writing system?

It's actually extremely easy to create new words using Chinese characters. You simply put existing characters together. To use some computing related terms for example:

mobile phone: 手机 (hand machine)

programming: 编程 (compose orders)

QR code: 二维码 (2d code)

hacker: 黑客 (black guest, this also sounds phonetically like "hacker")

For brands and other names, there is a system of transliterating sounds using characters, or simply a literal translation of the name, e.g.

Android: 安卓 (an zhuo)

Microsoft: 微软 (small soft)

Apple: 苹果 (literally apple)

Note that what the author said was "danger plus oppertunity" for the word for crisis -- 危机 -- shares the same last character as 手机。 While easy to compose and sometimes understand, it's not as simple as the author implied.