g++ -o hanoi hanoi.cpp -O2 -lrt
So I'm seeing similar results as you. Your iterative implementation is about 3x faster than the author's, but still not as fast as the recursive version. I'm surprised!