This is a great point about handling printable vs non-printable characters that I originally missed when I read wc code. Thank you for pointing this out!
I ask because the essay and your comments until now show no insight from reading the code.
I find it difficult to understand how anyone could miss that (rather significant) part of the core algorithm, and then assert the differences are due to only "modulo intended Unicode space handling" and the like.
Until now I had assumed you had a lay understanding of wc, and had not read the code.
I was mostly curious about how wc handles spaces and whether ignoring non-ascii spaces brings me closer or farther from what wc does. So I focused on that, and this specific printable characters handling didn't caught my eye.
On a meta level, I wasn't even considering that the notion of a word might be different from "a sequence of characters that aren't space characters".
I ask because the essay and your comments until now show no insight from reading the code.
I find it difficult to understand how anyone could miss that (rather significant) part of the core algorithm, and then assert the differences are due to only "modulo intended Unicode space handling" and the like.
Until now I had assumed you had a lay understanding of wc, and had not read the code.