Hacker News new | ask | show | jobs
by injinj 1642 days ago
Decent speedup on my 3970x. Including this patch reduced the number of instructions in lzma_decoder.s (using gcc 8.3.1) by about 8% (4417 lines of asm vs 4824). With perf stat, an astonishing branches-missed reduction from 409K to 104K.

Using the firefox example from the gist:

  $ tar -cJf lib.tar.xz /usr/lib64/firefox
The xz shipped from the system:

  $ perf stat xz -c -d lib.tar.xz > /dev/null                                                                               

  Performance counter stats for 'xz -c -d lib.tar.xz':

          4,650.32 msec task-clock:u              #    1.000 CPUs utilized          
                 0      context-switches:u        #    0.000 K/sec                  
                 0      cpu-migrations:u          #    0.000 K/sec                  
               591      page-faults:u             #    0.127 K/sec                  
    19,849,912,300      cycles:u                  #    4.269 GHz                      (83.33%)
       425,290,878      stalled-cycles-frontend:u #    2.14% frontend cycles idle     (83.33%)
     1,831,640,390      stalled-cycles-backend:u  #    9.23% backend cycles idle      (83.34%)
    23,973,036,103      instructions:u            #    1.21  insn per cycle         
                                                  #    0.08  stalled cycles per insn  (83.33%)
     2,939,144,233      branches:u                #  632.031 M/sec                    (83.34%)
       409,371,860      branch-misses:u           #   13.93% of all branches          (83.33%)

       4.650679926 seconds time elapsed

       4.611657000 seconds user
       0.011931000 seconds sys
The xz patched.

  $ git clone http://git.tukaani.org/xz.git
  $ cd xz/src
  $ patch -l -p1 < ../faster_lxma_decoder_x86.patch
  $ cd .. ; autogen.sh && configure && make
  $ LD_PRELOAD=./liblzma/.libs/liblzma.so
  $ perf stat ./xz/.libs/xz -c -d ../../lib.tar.xz > /dev/null                                                              

  Performance counter stats for './xz/.libs/xz -c -d ../../lib.tar.xz':

          3,578.54 msec task-clock:u              #    1.000 CPUs utilized          
                 0      context-switches:u        #    0.000 K/sec                  
                 0      cpu-migrations:u          #    0.000 K/sec                  
               593      page-faults:u             #    0.166 K/sec                  
    15,186,685,715      cycles:u                  #    4.244 GHz                      (83.32%)
       108,663,507      stalled-cycles-frontend:u #    0.72% frontend cycles idle     (83.32%)
     8,753,057,119      stalled-cycles-backend:u  #   57.64% backend cycles idle      (83.34%)
    27,322,182,837      instructions:u            #    1.80  insn per cycle         
                                                  #    0.32  stalled cycles per insn  (83.35%)
     1,979,944,734      branches:u                #  553.282 M/sec                    (83.34%)
       104,752,154      branch-misses:u           #    5.29% of all branches          (83.34%)

       3.578973194 seconds time elapsed

       3.549329000 seconds user
       0.011942000 seconds sys