| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by david-given 3929 days ago

Hey, awesome! And it produces a true goto as well, working on the bytecode level. I wonder if there's a way to remove the extra nops?

This is really, really useful for doing language-to-language translation. Without the ability to arbitrarily branch from basic block to basic block, it's possible for the basic block graph produced by a program in one language to be simply inexpressible in your target language. You end up having to fake it with a while loop and switch statement, which has horrible performance implications.

(Emscripten has some exotic algorithms to try and express the source program's basic block graph using the target language's structured programming constructs. This is vitally important to make goto-less Javascript perform well, but it can't work in all cases, and some C functions will force Emscripten to fall back to loop-and-switch. And, of course, these functions are typically the ones which have been carefully hand-optimised for speed...)

2 comments

wallunit 3929 days ago

> I wonder if there's a way to remove the extra nops?

Yes, technically it would be possible to remove the code instead injecting NOPs. But then I'd have to adjust the jump targets. However, I don't think these NOPs are too bad. Note that goto jumps directly after the NOP ramp of the label, and the NOPs of the goto itself are never reached. The only scenario where NOPs are actually seen by the interpreter is when the natural code path visits a label.

EDIT: Instead re-assembling the bytecode and adjusting jumps, I went to use JUMP_FORWARD(3) instead 7 NOPs now. Note that there are still 4 NOPs left to fill the gap, these however are never executed as they are skipped by the preceding jump instruction. https://github.com/snoack/python-goto/commit/2b0f5e5069cbb88...

link

masklinn 3929 days ago

Is it certain that JUMP_FORWARD is faster than a few NOPs?

link

wallunit 3929 days ago

I did run the example from the README, with "%timeit range(0, 1000)" in ipython:

10000 loops, best of 3: 72.9 µs per loop @ CPython 2.7.10 with NOP

10000 loops, best of 3: 77.2 µs per loop @ CPython 2.7.10 with JUMP_FORWARD

10000 loops, best of 3: 106 µs per loop @ CPython 3.5.0 with NOP

10000 loops, best of 3: 106 µs per loop @ CPython 3.5.0 with JUMP_FORWARD

100000 loops, best of 3: 8.6 µs per loop @ PyPy 2.4.0 with NOP

100000 loops, best of 3: 8.7 µs per loop @ PyPy 2.4.0 with JUMP_FORWARD

To my surprise, in fact, JUMP_FORWARD isn't any faster than 7 NOPs. In Python 2.7, JUMP_FORWARD is even slower. So reverted:

https://github.com/snoack/python-goto/commit/d19d244a9e5efdf...

Thanks for the pointer!

link

david-given 3929 days ago

D'oh. Yes, of course.

It'd be interesting to know whether peculiar basic block graphs upset PyPy's JIT. Apparently Lua's JIT is much less good at optimising programs that don't look like normal Lua.

link

masklinn 3929 days ago

> I wonder if there's a way to remove the extra nops?

Sure you just have to recompute all the jump destinations and patch the jumps themselves.

link

userbinator 3929 days ago

Jump destination computation was the subject of a recent HN item:

https://news.ycombinator.com/item?id=10219007

link