Er. Author missed a crucial point of "just" using PyPy: for me it's over 24x speedup over standard python (if you run it enough times for JIT to warmup). Know your tools
For that matter, his numpy code, which wasn't described until the comments, was terrible. Unlike all the other versions, it included a complete reallocation, conversion and copying step, which accounted for almost the entire processing time.
People in the comments who tried a reasonable numpy version found it to be around the same speed as the unoptimized C version.
I got 25x on sum_naive_python() and 6x on sum_native_python() (PyPy 2.7.3 vs CPython 2.7.6 on Intel Arrandale), using timeit to measure time and not cycles.
But IMO it's a nice article anyways. I'd wager the final implementation he comes up with is faster than PyPy.
People in the comments who tried a reasonable numpy version found it to be around the same speed as the unoptimized C version.