I had a binary parser written in Python that took around 30 seconds on typical input on CPython. PyPy took that down to about 10 seconds. Rewriting it in C# took it down to 200 ms.
Not single byte, but individual fields (float32/int32/string etc). Yes, I expected a much more significant speed-up as well. It's probably because a lot of that code was driven by reflection-type techniques.
Curiously, IronPython did better than anything (but still slow). Haven't tried Jython.
Compiling the whole thing with Cython was less effective than PyPy.