Hacker News new | ask | show | jobs
by arc776 2071 days ago
I completely agree. Everything is baked in to be slow. There is no way around it, I don't think you can write super fast interpreters like with Javascript - I might be wrong, but so far it hasn't happened.

For general use cases the performance is fine, but only thanks to the hard work of C/CPython/Cython programmers who give up Python's rich expressibility to gain this performance. It seems like you simply have to use another language to get anything running fast.

Having said all that, Pyc seems interesting as it apparently compiles Python. Has anyone had any experience of this?

1 comments

> There is no way around it

What aspects of the language are you convinced cannot be optimised? There's tons of research in this area.

As the OP states:

> mutable interpreter frames, global interpreter locks, shared global state, type slots

On top of this, Python is extremely dynamic and nothing can be assured without running through the code. So this leads to needing JITs to improve performance which then give a slow start up time and increased complexity. Even with JIT, Python is just not fast thanks to the above issues and it's overall dynamism.

It can be optimised and for sure there's some impressive attempts at doing so. However I don't think pure Python will ever be considered "fast" as these things necessarily get in the way.

I highly recommend the two videos posted here that go into more detail as to why there are limits to how far optimisation can go: https://youtu.be/qCGofLIzX6g https://youtu.be/IeSu_odkI5I

> why there are limits to how far optimisation can go

I'd challenge the idea that there really are known 'limits'. As I say there's research towards this, these videos are old, and Armin and Seth may not be up to date with all of the literature (in fact I'm sure Seth is not, as he's missing at least one major current Python implementation research project from his blog post.)

> I'd challenge the idea that there really are known 'limits'.

There are good reasons why these limits cannot be overcome in that the complexity and dynamism of the language precludes it.

Being interpreted is one cost that sets a significant barrier to performance, and the dynamic complexity further compounds it. For example whereas JS is basically only functions, in Python you have a huge range of ways you can do incredibly complex things with slot wrappers, descriptors, and metaprogramming.

Ultimately, Python will get faster, but diminishing returns are inevitable. Python can never be as fast as the equivalent code in a compiled language. It simply has too much extra work to do.

> There are good reasons why these limits cannot be overcome in that the complexity and dynamism of the language precludes it.

Can you give specific examples and prove that they cannot be overcome?

How much of the literature have you read?

I'll give you a concrete example of how I see these claims - people said monkey-patching in Python and Ruby was a hard overhead to peak temporal performance and fundamentally added a cost that could not be removed... turns out no that cost can be completely eliminated. I could give you a list of similar examples as long as you want.

> ... turns out no that cost can be completely eliminated.

Is it eliminated in any production interpreter/VM used by Python, Ruby or any other mainstream language?

I mean, it's nice if it's research, but if I'm a boring programmer churning out Enterprise Middleware using these languages, do I get to use it?

Or is it just a pre-alpha branch of PyPy that might be out in 2025, if we're lucky? :-)

> Can you give specific examples and prove that they cannot be overcome?

It's hard to prove a theoretical negative, but perhaps by comparison with the run time performance of static AOT (SAOT) compiled languages I can show what I mean.

Dynamic typing:

- Python requires dynamic type checking before any program work is done. SAOT doesn't need to do any run time work.

- Adding two numbers in Python requires handling run time overloads and a host of other complexities. SAOT is a single machine code instruction that requires no extra work.

Boxing values:

- Python value boxing requires jumping about in heap memory. SAOT language can not only remove this cost but reduce it to raw registers loaded from the stack and prefetched in chunks. This massively improves cache performance by orders of magnitude.

Determinism:

- In Python program operation can only be determined by running the code. In SAOT, since all the information is known at compile time programs can be further folded down, loops unrolled, and/or SIMD applied.

Run time overheads

- Python requires an interpretive run time. SAOT does not.

In summary: Python necessarily requires extra work at run time due to dynamic behaviour. SAOT languages can eliminate this extra work.

I do understand though that with JIT a lot of these costs can be reduced massively if not eliminated once the JIT has run through the code once. For example here they go through the painful process of optimising Python code to find what is actually slowing things down, to the point of rewriting in C: http://blog.kevmod.com/2020/05/python-performance-its-not-ju...

At the end they point out that PyPy gives a very impressive result that is actually faster than their C code. Of course, this benchmark is largely testing unicode string libraries rather than the language itself and I'd argue this is an outlier.

> How much of the literature have you read?

Literature on speeding up Python or high performance computing? The former, very little, the latter, quite a lot. My background is in performance computing and embedded software.

I'm definitely interested in the subject though if you've got some good reading material?

> people said monkey-patching in Python and Ruby was a hard overhead to peak temporal performance and fundamentally added a cost that could not be removed... turns out no that cost can be completely eliminated.

This really surprised me. Completely eliminated? I'm really curious how this is possible. Do you have any links explaining this?

C has a slow startup time as well. We just call that compiling.

Having the option to be slow startup/fast execution is a good option to have. Maybe not for some, but definitely needed by others.

Chris already proved we can do exactly that for Ruby with TruffleRuby. I don’t think there’s any reason GraalPython couldn’t do the same given more work?
I'm not familiar with Ruby, but it doesn't seem like TruffleRuby is really competitive with languages known for performance.

These are only simple benchmarks, but do indicate a rough ballpark for TruffleRuby: https://github.com/kostya/benchmarks

As I understand it, Crystal would be a good Ruby alternative if you want performance. This is of course a whole new language designed with performance in mind from the beginning and here is a repeating theme: you need to consider performance at the start, not 20 years later.

Without running again using GraalVM EE which TruffleRuby needs for things like Partial Escape Analysis these benchmarks aren’t very useful. I’ve made the same mistake myself before when benchmarking TR.

Crystal has wildly different semantics to Ruby, so it’s not a good alternative at all.

Ah, that's a good point. It would be interesting to see fairer benchmarks for TruffleRuby after reading about some of the optimisations they're able to make.
Could we seriously start looking at deprecating some of the features that make Python slow? Who needs mutable interpreter frames?
Why not make mutable interpreter frames fast instead?