Hacker News new | ask | show | jobs
by stabbles 164 days ago
Author here. There is a part 2 to this: https://stoppels.ch/2022/11/30/io-is-no-longer-the-bottlenec...
2 comments

Hello, a couple years ago I participated in a contest to count word frequencies and generate a sorted histogram. There's a cool post about it featuring a video discussing the tricks used by some participants. https://easyperf.net/blog/2022/05/28/Performance-analysis-an...

Some other participants said that they measured 0 difference in runtime between pshufb+eq and eqx3+orx2, but i think your problem has more classes of whitespace, and for the histogram problem, considerations about how to hash all the words in a chunk of the input dominate considerations about how to obtain the bitmasks of word-start or word-end positions.

Awesome! The slides with roofline analysis are great! https://docs.google.com/presentation/d/16M90It8nOK-Oiy7j9Kw2...
If this is on a single core then the "6GB/s" guy is disproven not just in theory but also in practice.