| Arbitrarily saying "No you don't" isn't indicative an informed opinion. I haven't really dug into these papers, though the Stanford paper does say "This conclusion is consistent with results from part-of-speech tagging experiments, where we found that radicals of previous word are not a helpful feature, although the radical of the current word is.", whereas the quote you pulled out has to do with language modeling. Though I wouldn't consider a single negative result from before the deep learning trend took of necessarily indicative of the value. The more recent paper, on the other hand, sees a positive boost from their "hierarchical radical embeddings" vs traditional word or character embeddings for 4 classification tasks. Not that this is necessarily meaningful either. In my mind, the usefulness of this would be, not that you would get new information, per se, but that you could generalize some amount of knowledge to rare/out of vocabulary words. Since you work in the field though, do you have any pointers to good papers on Chinese NLP? |
- https://aclanthology.info/pdf/I/I05/I05-7002.pdf This paper make use of the radicals to build an ontology, but it does so with a stunting amount of depth (historical context, variants, etc.) that most works overlook. Too bad no data is available.
- http://www.persee.fr/doc/clao_0153-3320_1978_num_4_1_1047 Very interesting read on the formation of Chinese-like characters by the Vietnamese. Some technics described were also used by the Japanese when adopting sinograms.
- didn't read the paper but the references section lists a number of paper about the segmentation of Mandarin http://www.anthology.aclweb.org/F/F12/F12-3001.pdf
- didn't read it yet, but seems to contains accurate information of the Chinese writing system: http://learnlab.org/uploads/mypslc/publications/perfetti-lex...
Anyway, I think for getting a fair understanding of the writing system the learning of about 600 characters in either Chinese or Japanese + basic of the chosen language is required.