Hacker News new | ask | show | jobs
by rockwotj 897 days ago
Yeah so I had a discussion on Twitter about this, turns out 12GB is small enough to fit into memory, and the author runs submissions by running a solution 5 times in a row, so using direct IO actually hurts because having the kernel cache is a way to enforce the file is in memory for the 4 runs after. I have a direct IO solution with SIMD string search and double parsing, just in C++ (using libraries). It runs in 6 seconds on my 24 core linux box (NVMe).

Code: https://github.com/rockwotj/1brc

Discussion on Filesystem cache: https://x.com/rockwotj/status/1742168024776430041?s=20

4 comments

I missed the "5 times in a row." If you do that, yeah, keeping the whole thing in memory is far better.
> double parsing

In case you haven't noticed yet, the input format guarantees exactly one fractional digit, so you can read a single signed integer followed by `.` and one digit instead.

Yeah I missed this originally, and stuff could be faster with this assumption without a full double parser. The fastest java solution dies some near branchless decoding for these
could you just add the character values eg 49 for ascii 1, and then subtract off the offset once at the end instead of doing atoi on each line?

edit: doh that works for min and max but the average overflows.

Yes. I'm not sure it'll help, but def worth a try.
Wow, that's pretty fast considering how simple main.cc looks. I do love c++. Nice use of coroutines, too.
So you are basically at the mercy of the OS caching algorithm. That sounds like a bad plan for a benchmark. You are not measuring what you think you are (your code), you are measuring the OS caching policy.