Hacker News new | ask | show | jobs
by david-given 3929 days ago
Hey, awesome! And it produces a true goto as well, working on the bytecode level. I wonder if there's a way to remove the extra nops?

This is really, really useful for doing language-to-language translation. Without the ability to arbitrarily branch from basic block to basic block, it's possible for the basic block graph produced by a program in one language to be simply inexpressible in your target language. You end up having to fake it with a while loop and switch statement, which has horrible performance implications.

(Emscripten has some exotic algorithms to try and express the source program's basic block graph using the target language's structured programming constructs. This is vitally important to make goto-less Javascript perform well, but it can't work in all cases, and some C functions will force Emscripten to fall back to loop-and-switch. And, of course, these functions are typically the ones which have been carefully hand-optimised for speed...)

2 comments

> I wonder if there's a way to remove the extra nops?

Yes, technically it would be possible to remove the code instead injecting NOPs. But then I'd have to adjust the jump targets. However, I don't think these NOPs are too bad. Note that goto jumps directly after the NOP ramp of the label, and the NOPs of the goto itself are never reached. The only scenario where NOPs are actually seen by the interpreter is when the natural code path visits a label.

EDIT: Instead re-assembling the bytecode and adjusting jumps, I went to use JUMP_FORWARD(3) instead 7 NOPs now. Note that there are still 4 NOPs left to fill the gap, these however are never executed as they are skipped by the preceding jump instruction. https://github.com/snoack/python-goto/commit/2b0f5e5069cbb88...

Is it certain that JUMP_FORWARD is faster than a few NOPs?
I did run the example from the README, with "%timeit range(0, 1000)" in ipython:

10000 loops, best of 3: 72.9 µs per loop @ CPython 2.7.10 with NOP

10000 loops, best of 3: 77.2 µs per loop @ CPython 2.7.10 with JUMP_FORWARD

10000 loops, best of 3: 106 µs per loop @ CPython 3.5.0 with NOP

10000 loops, best of 3: 106 µs per loop @ CPython 3.5.0 with JUMP_FORWARD

100000 loops, best of 3: 8.6 µs per loop @ PyPy 2.4.0 with NOP

100000 loops, best of 3: 8.7 µs per loop @ PyPy 2.4.0 with JUMP_FORWARD

To my surprise, in fact, JUMP_FORWARD isn't any faster than 7 NOPs. In Python 2.7, JUMP_FORWARD is even slower. So reverted:

https://github.com/snoack/python-goto/commit/d19d244a9e5efdf...

Thanks for the pointer!

D'oh. Yes, of course.

It'd be interesting to know whether peculiar basic block graphs upset PyPy's JIT. Apparently Lua's JIT is much less good at optimising programs that don't look like normal Lua.

> I wonder if there's a way to remove the extra nops?

Sure you just have to recompute all the jump destinations and patch the jumps themselves.

Jump destination computation was the subject of a recent HN item:

https://news.ycombinator.com/item?id=10219007