|
|
|
|
|
by rsp1984
3643 days ago
|
|
This. The article does not explain at all where exactly all the processor cycles are going. It's basically just saying "it's not the Python languages' fault!" but fails to name a specific culprit. It says it's spending the cycles in the "C Runtime" but what exactly does it (have to) do in the C Runtime that eats up performance? |
|
The article is actually saying the exact opposite. It claims Python the Language is slow because the opcodes need to do a lot of work according to the language specification. Python is not slow because the core team has done a poor job implementing the opcode interpreter and runtime.
When you have a language with thin opcodes that map closely to processor instructions then compiler improvements lead to smarter opcode generation which translates to efficient machine code after jitting. When you have fat opcodes you're SOL.
Consider this: an instruction like STORE_ATTR (implements obj.name = value) has to check whether the top of the stack refers to an object, then whether writing to the object attributes is allowed. Then "name" has to be checked if it's a string. Perhaps additional checking is needed to normalize the string or other unicode testing. Then the assignment has to happen which is a dictionary update (internally a hash map insertion which may trigger a resize). This is just the tip of the iceberg. A lot more stuff is happening and the correct exceptions are thrown when the instruction is used incorrectly (which leads to code bloat which hurts the instruction cache).
A thin bytecode instruction for STORE_ATTR could actually reduce the store to a single LEA machine code instruction (Load Effective Address).
The downside of a language with a thin instruction set is that the individual instructions can't validate their input. They have to trust that the compiler did its job correctly, a segfault or memory corruption will happen otherwise. One of Guido's goals when creating Python was that the runtime should never ever crash, even on nonsensical input. This pretty much rules out a thin instruction set right from the start. Simple concurrency is also a lot easier with a fat instruction set, because the interpreter can just yield in between instructions (Global interpreter lock). With a thin instruction set there is no clear delineation between instructions where all memory is guaranteed to be in a stable state. So a different locking model is needed for multi-threading, which adds even more complexity to the compiler and runtime.