Hacker News new | ask | show | jobs
by romka2 2631 days ago
Thanks, that's good to know.

The final version of ListDir calls memcmp only when there are files in the same directory that have identical first 8 characters. Apparently, this is rare enough that memcmp doesn't show on the CPU profile. But if it ever does, I'll look into replacing it with something else.

1 comments

You should try with the pathological but relatively common case of thousands of files named 'logname.YYYYMMDD.log.gz'
The final implementation of ListDir has two worse-case scenarios. One is when all files have identical 8 first character but then differ almost immediately. This is bad because I'm telling memcmp that I have 256 bytes of data, which causes it to use the vectorized loop only to exit it on the first iteration. Another is when all files are 255 characters long and differ at the very end. This is bad because string comparisons become very expensive. Even though I'm not showing these benchmarks in the article, this implementation performs very well even in these cases.