Hacker News new | ask | show | jobs
by lifthrasiir 4272 days ago
Some examples follow. I've really tested with arbitrary text on the Web, and I agree that they are somewhat marginal examples. (But I do think that Franc's margin for CJK languages is way wide.)

한국어 문서가 전 세계 웹에서 차지하는 비중은 2004년에 4.1%로, 이는 영어(35.8%), 중국어(14.1%), 일본어(9.6%), 스페인어(9%), 독일어(7%)에 이어 전 세계 6위이다. 한글 문서와 한국어 문서를 같은 것으로 볼 때, 웹상에서의 한국어 사용 인구는 전 세계 69억여 명의 인구 중 약 1%에 해당한다.

This text from Korean Wikipedia is about the ratio of Korean documents over all documents in the Internet. Digits distort the overall ratio and Franc doesn't give any candidates (even no "und").

現行の学校文法では、英語にあるような「目的語」「補語」などの成分はないとする。英語文法では "I read a book." の "a book" はSVO文型の一部をなす目的語であり、また、"I go to the library." の "the library" は前置詞とともに付け加えられた修飾語と考えられる。

This text from Japanese Wikipedia concerns about the distinction of objectives and complements in the English syntax. In this bilingual text it looks like that Japanese has reached the 60% threshold but the codepoint count doesn't.

2 comments

I pushed a fix, incorporating your suggestions, and your examples in the specs.

Thanks a lot!

Oh you’re right. I think I have a fix in mind, will work on it. Thanks so much!