|
|
|
|
|
by jxndnxu
733 days ago
|
|
Because classic wc is not iterating over every byte once, but multiple times. It's especially obvious in the Unicode case where it first takes 1-4 bytes to get a Unicode character and then checks this character with another function to see if it's whitespace But even with with naive ASCII approach, if you don't hand roll a state machine you are checking multiple conditions on each byte (is it a space and am I leaving a word etc) Using a dfa has fixed compute per byte |
|
An actual sampled profile showing the two approaches would be interesting. Naively it seems like it's just because it has faster UTF8 handling and nothing to do with being a state machine exactly