Hacker News new | ask | show | jobs
by zadokshi 734 days ago
> why this is better than just iterating over characters?

The explanation it provides regarding Apache vs Ngnix seemed to me to imply that Apache requires memory allocation to read headers into buffers and Ngnix uses a state machine to avoid memory allocation. Memory allocation is slower than most people realise.

1 comments

Why is this better for wc, though? wc is not doing memory allocation in the hot loop.
Because classic wc is not iterating over every byte once, but multiple times.

It's especially obvious in the Unicode case where it first takes 1-4 bytes to get a Unicode character and then checks this character with another function to see if it's whitespace

But even with with naive ASCII approach, if you don't hand roll a state machine you are checking multiple conditions on each byte (is it a space and am I leaving a word etc)

Using a dfa has fixed compute per byte

Those 1-4 bytes are sitting in a register the entire time and thus basically free to read as often as you want, though.

An actual sampled profile showing the two approaches would be interesting. Naively it seems like it's just because it has faster UTF8 handling and nothing to do with being a state machine exactly

According to the authors it's also faster on files full of 'x' or ' ', so there must be more than just better unicode support.
Even something as simple as wc calling a libc function for parsing utf-8, that doesn't get inlined, would destroy its performance relative to anything optimized.

Personally, I'd expect SIMD to win over all of these. wc sounds like kind of challenge that's very easy to partition and process in chunks, though UTF-8 might ruin that.

> very easy to partition and process in chunks

Which counters to increment at each byte depends on the previous bytes, though. You could probably succeed using overlapping chunks, but I wouldn't call it very easy.

I think it keeps wc simple if an algorithm is introduced not directly related to count words it becomes more complicated to debug. I see this a an example of the Unix philosophy of do one thing well.