| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by kbenson 938 days ago

Because for any nontrivial case you would expect python+compiled library and associated marshaling of data to be slower than that library in its native implementation without any inyerop/marshaling required.

When you see an interpreted language faster than a compiled one, it's worth looking at why, because most the time it's because there's some hidden issue causing the other to be slow (which could just be a different and much worse implementation).

Put another way, you can do a lot to make a Honda Civic very fast, but when you hear one goes up against a Ferrari and wins your first thoughts should be about what the test was, how the Civic was modified, and if the Ferrari had problems or the test wasn't to its strengths at all. If you just think "yeah, I love Civics, that's awesome" then you're not thinking critically enough about it.

3 comments

Attummm 938 days ago

In this case, Python's code (opening and loading the content of a file) operates almost fully within its C runtime.

The C components initiate the system call and manage the file pointer, which loads the data from the disk into a pyobj string.

Therefore, it isn't so much Python itself that is being tested, but rather python underlying C runtime.

link

kbenson 938 days ago

Yep, and the next logical question when both implementations are for the most part bare metal (compiled and low-level), is why is there a large difference? Is it a matter of implementation/algorithm, inefficiency, or a bug somewhere? In this case, that search turned up a hardware issue that should be addressed, which is why it's so useful to examine these things.

link

heavyset_go 938 days ago

If you're staying within Python and its C-extensions, there is no marshalling, you're dealing with raw PyObjects that are exposed to the interpreter.

link

lmm 938 days ago

> Because for any nontrivial case you would expect python+compiled library and associated marshaling of data to be slower than that library in its native implementation without any inyerop/marshaling required.

> When you see an interpreted language faster than a compiled one, it's worth looking at why, because most the time it's because there's some hidden issue causing the other to be slow (which could just be a different and much worse implementation).

On the contrary, the compiled languages tend to only be faster in trivial benchmarks. In real-world systems the Python-based systems tends to be faster because they haven't had to spend so long twiddling which integers they're using and debugging crashes and memory leaks, and got to spend more time on the problem.

link

kbenson 938 days ago

I don't doubt that can happen, but I'm also highly doubtful that it's the norm for large, established, mature projects with lots of attention, such as popular libraries and the standard library of popular languages. As time spent on the project increases, I suspect that any gain an interpreted language has over an (efficient) compiled one not only gets smaller, but eventually reverses in most cases.

So, like in most things, the details can sometimes matter quite a bit.

link

lmm 938 days ago

> I don't doubt that can happen, but I'm also highly doubtful that it's the norm for large, established, mature projects with lots of attention, such as popular libraries and the standard library of popular languages.

Code that has lots of attention is different, certainly, but it's also the exception rather than the rule; the last figure I saw was that 90% of code is internal business applications that are never even made publicly available in any form, much less subject to outside code review or contributions.

> As time spent on the project increases, I suspect that any gain an interpreted language has over an (efficient) compiled one not only gets smaller, but eventually reverses in most cases.

In terms of the limit of an efficient implementation (which certainly something like Python is nowhere near), I've seen it argued both ways; with something like K the argument is that a tiny interpreter that sits in L1 and takes its instructions in a very compact form ends up saving you more memory bandwidth (compared to what you'd have to compile those tiny interpreter instructions into if you wanted them to execute "directly") than it costs.

link

JonChesterfield 937 days ago

> a tiny interpreter that sits in L1 and takes its instructions in a very compact form ends up saving you more memory bandwidth

There's a paper on this you might like. https://www.researchgate.net/publication/2749121_When_are_By...

I think there's something to the idea of keeping the program in the instruction cache by deliberately executing parts of it via interpreted bytecode. There should be an optimum around zero instruction cache misses, either from keeping everything resident, or from deliberately paging instructions in and out as control flow in the program changes which parts are live.

There are complicated tradeoffs between code specialisation and size. Translating some back and forth between machine code and bytecode adds another dimension to that.

I fear it's either the domain of extremely specialised handwritten code - luajit's interpreter is the canonical example - of the the sufficiently smart compiler. In this case a very smart compiler.

link

JonChesterfield 937 days ago

> On the contrary, the compiled languages tend to only be faster in trivial benchmarks. In real-world systems the Python-based systems tends to be faster because they haven't had to spend so long twiddling which integers they're using and debugging crashes and memory leaks, and got to spend more time on the problem.

This is an interesting premise.

Python in particular gets an absolute kicking for being slow. Hence all the libraries written in C or C++ then wrapped in a python interface. Also why "python was faster than rust at anything" is headline worthy.

I note your claim is that python systems in general tend to be faster (outside of trivial benchmarks, whatever the scope of that is). Can you cite any single example where this is the case?

link

lmm 937 days ago

> Can you cite any single example where this is the case?

Plenty of line-of-business systems I've seen, but systems big enough to matter tend not to be public. Bitbucket's cloud and on-prem version are the only case I can think of where you can directly compare something substantial between an implementation known to be written in Python and an implementation that's known to be written in C/C++ (and even then I'm not 100% that that's what they use).

link