| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rsp1984 3643 days ago
	This. The article does not explain at all where exactly all the processor cycles are going. It's basically just saying "it's not the Python languages' fault!" but fails to name a specific culprit. It says it's spending the cycles in the "C Runtime" but what exactly does it (have to) do in the C Runtime that eats up performance?

1 comments

gizmo 3643 days ago

> It's basically just saying "it's not the Python languages' fault!"

The article is actually saying the exact opposite. It claims Python the Language is slow because the opcodes need to do a lot of work according to the language specification. Python is not slow because the core team has done a poor job implementing the opcode interpreter and runtime.

When you have a language with thin opcodes that map closely to processor instructions then compiler improvements lead to smarter opcode generation which translates to efficient machine code after jitting. When you have fat opcodes you're SOL.

Consider this: an instruction like STORE_ATTR (implements obj.name = value) has to check whether the top of the stack refers to an object, then whether writing to the object attributes is allowed. Then "name" has to be checked if it's a string. Perhaps additional checking is needed to normalize the string or other unicode testing. Then the assignment has to happen which is a dictionary update (internally a hash map insertion which may trigger a resize). This is just the tip of the iceberg. A lot more stuff is happening and the correct exceptions are thrown when the instruction is used incorrectly (which leads to code bloat which hurts the instruction cache).

A thin bytecode instruction for STORE_ATTR could actually reduce the store to a single LEA machine code instruction (Load Effective Address).

The downside of a language with a thin instruction set is that the individual instructions can't validate their input. They have to trust that the compiler did its job correctly, a segfault or memory corruption will happen otherwise. One of Guido's goals when creating Python was that the runtime should never ever crash, even on nonsensical input. This pretty much rules out a thin instruction set right from the start. Simple concurrency is also a lot easier with a fat instruction set, because the interpreter can just yield in between instructions (Global interpreter lock). With a thin instruction set there is no clear delineation between instructions where all memory is guaranteed to be in a stable state. So a different locking model is needed for multi-threading, which adds even more complexity to the compiler and runtime.

link

chrisseaton 3643 days ago

All the problems you're describing are solved with a powerful JIT. And the core team do seem to be opposed to doing the work needed for that.

link

gizmo 3643 days ago

Python's philosophy chooses simplicity over everything else. Simple grammar. Simple bytecode instruction set. Simple CPython implementation. Simple threading model (GIL). Simple data structures. Adding a highly sophisticated and complicated JIT on top of that makes little sense.

It's not so difficult to create a high performance language that's much like Python. It's just not possible to make Python fast without abandoning some of its founding principles.

link

chrisseaton 3643 days ago

Why is a simple CPython implementation such an important requirement?

Portability? Make the JIT optional.

Ease of maintenance? Get a small team of experts to maintain it on behalf of everyone else.

Openness to beginners? That would be nice if possible as well, but CPython's job is to run programs rather than to educate.

A JIT needn't make the grammar, bytecode or threading model more complex. It would make data structures and the implementation more complex, but do you not think that's worth it if Python could be twice as fast?

link

orf 3643 days ago

> CPython's job is to run programs rather than to educate.

CPythons 'job' is to be the reference implementation of Python.

link

chrisseaton 3643 days ago

But that's just not the case in reality is it? In reality it's the main production implementation and its inefficiency costs the world wasted resources every day.

If readability and being the reference implementation is more important than performance, why is Python implemented in C rather than a higher level language?

link

gshulegaard 3639 days ago

To be fair, the GIL wasn't included because it was a simple threading model (AFAIK). It was included because it was simple to implement and it was/is fast(er) (than removing it)[1][2].

If the Gilectomy [2] project succeeds, Guido has mentioned he would consider it for Python3.6+ [3].

[1] http://www.artima.com/weblogs/viewpost.jsp?thread=214235

[2] https://www.youtube.com/watch?v=P3AyI_u66Bw

[3] https://youtu.be/YgtL4S7Hrwo?t=10m59s

link

pjmlp 3643 days ago

Hence why I rather support Julia and leave Python for shell scripting like tasks.

link

astrobe_ 3643 days ago

coughsufficiently smart compilercough.

link

chrisseaton 3643 days ago

No we have compilers that can do these things today.

link

nkurz 3643 days ago

A small nit: LEA does the calculation but doesn't read or write from that address. In times of old, this instruction used the memory addressing port to do the calculation, but these days it's just a normal arithmetic instruction with slight difference that it doesn't set the flags based on the result. Instead, the ideal would be for the bytecode to reduce to a single MOV. In addition to loading and storing, MOV itself supports several forms of indexed and indirect address calculation which execute on dedicated ports without adding latency.

link