Isn't 100 lines too much for a simple utility like wc? I get that it has many edge cases to cover, but edge cases usually require some different handling when you run into them anyway. I'd rather use my one-liner (including calculation in a single pass, and even parallel processing!):
Do you get a speed-up if you use getc_unlocked() instead of getc()? And if you write your own isspace()? As far as I know, isspace() is locale sensitive.
https://www.ioccc.org/2019/burton/prog.c
It is the Burton entry from The International Obfuscated C Code Contest, but seems like only handling ascii
https://www.ioccc.org/years-spoiler.html
Limits:
"Requires the C locale and ASCII character set. Input should be less than ten million octets to avoid this problem."
https://www.ioccc.org/2019/burton/hint.html