Hacker News new | ask | show | jobs
by munchbunny 1830 days ago
It's also not just edge cases in English or any specific language. The concept of a "word" doesn't exist in some languages. Chinese, for example, only has something comparable that is contextual and not syntactic. So how do you define "word count" in a document that mixes Chinese and English? Ignoring the Chinese characters altogether seems incorrect in the spirit of the metric, and trying to count using English syntax rules will still give you something incorrect in the spirit of the metric.