Hacker News new | ask | show | jobs
by to3m 3692 days ago
You can see the problem here:

First it cals CFAbsoluteTimeGetCurrent and saves the result.

    0x100272dab <+10443>: callq  0x1002b7b38 ; CFAbsoluteTimeGetCurrent
    0x100272db0 <+10448>: movapd %xmm0, -0xa0(%rbp)
Here is the call to flatDecodeDirect. I guess RDI is the input. That's usual for the x64 ABI.

    0x100272db8 <+10456>: movq   -0x100(%rbp), %rdi
    0x100272dbf <+10463>: callq  0x10026fb10 ; flatDecodeDirect
I don't know what this next bit is for.

    0x100272dc4 <+10468>: testq  %rax, %rax
    0x100272dc7 <+10471>: js     0x10027444c ; <+16236> [inlined] generic specialization <FlatBuffersPerformanceTestDesktop.FlatBufferReader> of Swift._ContiguousArrayBuffer._checkValidSubscript (Swift.Int) -> ()
0x3e8=1000. The loop counter is in ECX.

    0x100272dcd <+10477>: movl   $0x3e8, %ecx
I can't figure out what's at these two addresses; lldb didn't seem to accept any reasonable syntax. lldb is terrible. But I'll bet that RBX is holding the value of `total'. I don't know what r14 is, and it doesn't seem to matter since nothing here uses it.

    0x100272dd2 <+10482>: movq   -0xd8(%rbp), %rbx
    0x100272dd9 <+10489>: movq   -0x198(%rbp), %r14
Here's the loop. The loop is unrolled 5 times. total+=result. 0x100274454 produces some kind of exception on integer overflow.

    0x100272de0 <+10496>: addq   %rax, %rbx
    0x100272de3 <+10499>: jb     0x100274454 ; at flatbench.swift:284
    0x100272de9 <+10505>: addq   %rax, %rbx
    0x100272dec <+10508>: jb     0x100274454
    0x100272df2 <+10514>: addq   %rax, %rbx
    0x100272df5 <+10517>: jb     0x100274454
    0x100272dfb <+10523>: addq   %rax, %rbx
    0x100272dfe <+10526>: jb     0x100274454
    0x100272e04 <+10532>: addq   %rax, %rbx
    0x100272e07 <+10535>: jb     0x100274454
The loop was unrolled 5 times, so drop 5 from the loop counter and repeat.

    0x100272e0d <+10541>: addq   $-0x5, %rcx
    0x100272e11 <+10545>: jne    0x100272de0 ; at flatbench.swift:276
Get current time.

    0x100272e13 <+10547>: callq  0x1002b7b38 ; CFAbsoluteTimeGetCurrent
So this code actually times one call to flatDecodeDirect, then 200 iterations of an unrolled do-nothing loop. The compiler has figured out somehow that flatDecodeDirect is going to do exactly the same thing each time, and taken advantage of that by calling it only once. I'm guessing this means that flatDecodeDirect is only called 1,000 times in total.

As a sanity check for this kind of thing - try making a little loop that just increments an integer the appropriate number of times, and see how long that takes. (Check the assembly language output to ensure the generated code is doing what you think - it should be a 2-instruction loop.)

On my laptop that takes 1.8ms. This isn't the absolute limit of how long it takes to do 1,000,000 of anything, but it'll do as a rough estimate. So you should be suspicious if a program suggests it's taking much less time than that to do 1,000,000 of something that's a lot more complicated, as the test did. (It reported 1,000,000 iterations in 0.53ms on my PC.)

(Of course, as with any rough estimate, this only gives you a suspicion, and isn't proof without further investigation.)

1 comments

I was suspicious, I just could not put my finger on it :)

   for i in 0..<iterations {
     let result = flatuseStruct(outputData, start:i)
     assert(result == 8644311666 + Int(i))
     total2 = total2 + UInt64(result)
   }
this results in around 42ms compared to C 25ms. I guess I should update my blog post :)

Thanks for your help.