Hacker News new | ask | show | jobs
by pkolaczk 2205 days ago
Each test was executed twice for the same program. The caches were always dropped before the first run, but not the second run. The second run was performed immediately after the first one finished, to check how much it can benefit from cached files.

I guess the difference is drastic because of the following:

* parallelism: running 8 or more threads allows to keep the I/O busy. The I/O utilisation for single threaded runs is very low, because the program is either waiting for data or processing data but not both at the same time. I guess SSDs are generally very fast when they have a full queue of requests waiting to be served.

* the total amount of data to read is larger than my RAM. Other programs seem to just load into page cache all the data they read as they go. By the time they finish, the files touched at the beginning are already pushed out. Fclones caches only up to there point when there is no more free ram left. This way it also doesn't push other programs' data out of page cache.

* I observed when running many threads, the OS keeps CPU frequency close to the top. With one thread, there is probably too much waiting for I/O and the frequency stays close to the bottom 800 MHz. Paradoxically, when I was running fclones -t 1 (single thread) with Google Meet active at the same time, fclones returned faster than without running Meet.