| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by rockwotj 897 days ago

Yeah so I had a discussion on Twitter about this, turns out 12GB is small enough to fit into memory, and the author runs submissions by running a solution 5 times in a row, so using direct IO actually hurts because having the kernel cache is a way to enforce the file is in memory for the 4 runs after. I have a direct IO solution with SIMD string search and double parsing, just in C++ (using libraries). It runs in 6 seconds on my 24 core linux box (NVMe).

Code: https://github.com/rockwotj/1brc

Discussion on Filesystem cache: https://x.com/rockwotj/status/1742168024776430041?s=20

4 comments

pclmulqdq 897 days ago

I missed the "5 times in a row." If you do that, yeah, keeping the whole thing in memory is far better.

link

lifthrasiir 897 days ago

> double parsing

In case you haven't noticed yet, the input format guarantees exactly one fractional digit, so you can read a single signed integer followed by `.` and one digit instead.

link

rockwotj 897 days ago

Yeah I missed this originally, and stuff could be faster with this assumption without a full double parser. The fastest java solution dies some near branchless decoding for these

link

fragmede 897 days ago

could you just add the character values eg 49 for ascii 1, and then subtract off the offset once at the end instead of doing atoi on each line?

edit: doh that works for min and max but the average overflows.

link

gpvos 897 days ago

Yes. I'm not sure it'll help, but def worth a try.

link

winrid 897 days ago

Wow, that's pretty fast considering how simple main.cc looks. I do love c++. Nice use of coroutines, too.

link

alain94040 896 days ago

So you are basically at the mercy of the OS caching algorithm. That sounds like a bad plan for a benchmark. You are not measuring what you think you are (your code), you are measuring the OS caching policy.

link