Hacker News new | ask | show | jobs
by bazizbaziz 2847 days ago
This comment is excellent. The title of the original post should be: "When optimising code, never guess, always read the bytecode/assembly."

Without actually reading the assembly/bytecode/etc, you end up speculating about silly things like 'the two evaluations and assignments can happen in parallel, and so may happen on different cores.'.

4 comments

Possibly a subtitle: It's never what you think it is. I've worked in embedded and real time environments before and optimisation used to be my thing, but I am always surprised at how badly I guess what the problem is. It's a hard lesson to teach other people though because programmers are smart and smart people tend to default to thinking that they are correct :-)

But you are right: even when you isolate where the code is slow, you've still got a lot of work to do to find out why it is slow.

Indeed, I was unduly influenced by the code I was writing in the late 80s and early 90s that really did take languages with multiple assignments like this and ran them on different processors. You say it's a silly thing, but we used to do it - things have changed.

Added in edit: The magic term is "execution unit" not "core". As I say, things have changed, and the bundling of multiple execution units into each core, and multiple cores into each processor, is different in interesting and subtle ways from the situation I used to code, where I had a few hundred, or a few thousand, processors in each machine, but the individual processors were simpler.

I didn't mean to say this was a silly thing to do - most modern processors execute instructions out of order on multiple ALUs.

The problem is that the abstraction layer between the python code in question and the processor's instruction stream is so thick that it's hard to say one way or the other that the processor is indeed executing that particular pair of instructions in parallel. It's definitely executing many instructions out of order, but it's unclear (without inspection of the python interpreter and its assembly) what's happening at the machine level.

Looking at the bytecode of the python program at least begins to tells us that the python bytecode of the two versions is fundamentally different, which could account for the performance difference. Although, what exactly makes the material difference is also under debate elsewhere in the thread. :)

I'd say without measuring first you have too much to look through for anything large enough to be interesting. So measure, drill in, eliminate some first obvious culprits, measure again, then look at assembly.

I'm mentioning obvious because sometimes it really is, like doing a linear search in a hotspot where it should be a binary tree and things like that.

Measuring is more important. Knowing "a is faster than b" without understanding why is more useful than guessing what should be faster based on assembly and getting it wrong because you don't perfectly understand how the CPU is actually executing your code.

Looking at assembly/byte code is interesting to understand what you could tweak, but again you need measurements to verify.