The Cython example in that link is actually not a fair comparison, since it still forces the numpy ndarray type in the signature.
Instead it should use typed memoryviews [0], which are faster and can avoid more cases that will rely on the GIL accidentally (such as when an ndarray has to be treated as a Python object).
Tl;dr: pythran is very similar to numba but blazingly fast on cpu