| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by eMSF 2332 days ago

>I was initially using wc but that got too slow so I wrote something that was like wc but only did line counting.

Any reasonable wc should be fast enough for that purpose; just remember to use the '-l' switch to activate the fast path for line counting.

>Some things I noted when doing my testing is that each program counted the number of lines, characters, and words differently in this corpus.

All programs should report the same line counts (should be exactly the number of line feed characters in the input).

Other than that, it really depends on your current locale and the wc you're using (see https://github.com/expr-fi/fastlwc README for more details). Do note that this D implementation seems to implement the wc character counting behaviour you get with the '-m' switch (instead of the byte counting default).

Also worth noting that different operating systems have different locale definitions; glibc locales explicitly treat non-breaking spaces as non-whitespace characters, while for example Windows doesn't.