Hacker News new | ask | show | jobs
by andy_threos_io 1920 days ago
Most of the implementations are unable to handle larger words than the read (64k) buffer. The C is for sure.

edit: Also the C implementation will get in infinite loop with more than 64k words

1 comments

Yeah, I noticed that too: https://github.com/benhoyt/countwords/blob/9d81d13711e56c250...

The constraints do say that it is okay to assume lines are reasonable length. But yes, if you were making GNU wordfreq, that might not be something you want to assume. But at that point, it just depends on what you want your failure mode to be I suppose. greps for example will happily just gobble up as much memory as they can to fit a line into memory. :-)