| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by arc776 2071 days ago

> Can you give specific examples and prove that they cannot be overcome?

It's hard to prove a theoretical negative, but perhaps by comparison with the run time performance of static AOT (SAOT) compiled languages I can show what I mean.

Dynamic typing:

- Python requires dynamic type checking before any program work is done. SAOT doesn't need to do any run time work.

- Adding two numbers in Python requires handling run time overloads and a host of other complexities. SAOT is a single machine code instruction that requires no extra work.

Boxing values:

- Python value boxing requires jumping about in heap memory. SAOT language can not only remove this cost but reduce it to raw registers loaded from the stack and prefetched in chunks. This massively improves cache performance by orders of magnitude.

Determinism:

- In Python program operation can only be determined by running the code. In SAOT, since all the information is known at compile time programs can be further folded down, loops unrolled, and/or SIMD applied.

Run time overheads

- Python requires an interpretive run time. SAOT does not.

In summary: Python necessarily requires extra work at run time due to dynamic behaviour. SAOT languages can eliminate this extra work.

I do understand though that with JIT a lot of these costs can be reduced massively if not eliminated once the JIT has run through the code once. For example here they go through the painful process of optimising Python code to find what is actually slowing things down, to the point of rewriting in C: http://blog.kevmod.com/2020/05/python-performance-its-not-ju...

At the end they point out that PyPy gives a very impressive result that is actually faster than their C code. Of course, this benchmark is largely testing unicode string libraries rather than the language itself and I'd argue this is an outlier.

> How much of the literature have you read?

Literature on speeding up Python or high performance computing? The former, very little, the latter, quite a lot. My background is in performance computing and embedded software.

I'm definitely interested in the subject though if you've got some good reading material?

> people said monkey-patching in Python and Ruby was a hard overhead to peak temporal performance and fundamentally added a cost that could not be removed... turns out no that cost can be completely eliminated.

This really surprised me. Completely eliminated? I'm really curious how this is possible. Do you have any links explaining this?

1 comments

chrisseaton 2071 days ago

> I'm definitely interested in the subject though if you've got some good reading material?

My PhD's a good starting point on this subject https://chrisseaton.com/phd/, or I maintain https://rubybib.org/.

> This really surprised me. Completely eliminated? I'm really curious how this is possible. Do you have any links explaining this?

Through dynamic deoptimisation. Instead of checking if a method has been redefined... turn it on its head. Assume it has not (so machine code is literally exactly the same as if monkey patching was not possible), and get threads that want to monkey patch to stop other threads and tell them to start checking for redefined methods.

This is a great example because people said 'surely... surely... there will always be some overhead to check for monkey patching - no possible way to solve this can't be done' until people found the result already in the literature that solves it.

As long as you are not redefining methods in your fast path... it's literally exactly the same machine code as if monkey patching was not possible.

https://chrisseaton.com/truffleruby/deoptimizing/

arc776 2071 days ago

Thanks for the links. Very interesting to read how you get around these tricky dynamic operations!

> get threads that want to monkey patch to stop other threads and tell them to start checking for redefined methods

As an aside, this sort of reminds me of branch prediction at a higher level. A very neat way to speed up for the general case of no patching.

> This is a great example because people said 'surely... surely... there will always be some overhead to check for monkey patching - no possible way to solve this can't be done' until people found the result already in the literature that solves it.

There is still overhead when patching is used though. If you don't use the feature, you don't pay the cost, however when monkey patching is used there is a very definite cost to rewriting the JIT code and thread synchronisation that compiled languages would simply not have.

I can see where you're coming from here. If we can reduce all dynamic costs that aren't used to nothing then we will have the same performance as, say, C. At least, in theory.

It would be certainly be interesting to see a dynamic language that can deoptimise all its functionality to a static version to match a statically compiled language. Still, any dynamic features would nevertheless incur an increased cost over static languages.

It's the dynamism itself of Python that incurs the performance ceiling over static compilation, plus the issues I mentioned in my previous reply about boxing and cache pressures. However you've definitely given me some food for thought over how close the gap could potentially be.