Hacker News new | ask | show | jobs
by Rioghasarig 1825 days ago
Yeah the issue seems to be the concept of a "word" not being precisely defined. But I don't think that's a problem. I'd be concerned if the numbers were dramatically different but they're pretty close so there isn't an issue.

If exact precision is necessary you probably shouldn't rely on imprecise terms like "word".

2 comments

Agreed. If you are going to base a salary on imprecise concepts, then the pay will be imprecise. Concepts from the real world don't always fit nicely with the strict rules that we attempt to program them in. Trying to solve this problem by changing the constraints may prove to be easier than programming a precise solution around an imprecise concept.
It's also not just edge cases in English or any specific language. The concept of a "word" doesn't exist in some languages. Chinese, for example, only has something comparable that is contextual and not syntactic. So how do you define "word count" in a document that mixes Chinese and English? Ignoring the Chinese characters altogether seems incorrect in the spirit of the metric, and trying to count using English syntax rules will still give you something incorrect in the spirit of the metric.