|
|
|
|
|
by jared_hulbert
283 days ago
|
|
YES! gcc and clang don't like to optimize this. But they do if you hardcode the size_bytes to an aligned value. It kind of makes sense, what if a user passes size_bytes as 3? With enough effort the compilers could handle this, but it's a lot to ask. I just ran MAP_POPULATE the results are interesting. It speeds up the counting loop. Same speed or higher as the my read() to a malloced buffer tests. HOWEVER... It takes a longer time overall to do the population of the buffer. The end result is it's 2.5 seconds slower to run the full test when compared to the original. I did not guess that one correctly. time ./count_10_unrolled ./mnt/datafile.bin 53687091200
unrolled loop found 167802249 10s processed at 5.39 GB/s
./count_10_unrolled ./mnt/datafile.bin 53687091200 5.58s user 6.39s system 99% cpu 11.972 total
time ./count_10_populate ./mnt/datafile.bin 53687091200
unrolled loop found 167802249 10s processed at 8.99 GB/s
./count_10_populate ./mnt/datafile.bin 53687091200 5.56s user 8.99s system 99% cpu 14.551 total |
|