Hacker News new | ask | show | jobs
by em500 1244 days ago
So Rob Pike’s rule 1 and 2 again:

Rule 1. You can't tell where a program is going to spend its time. Bottlenecks occur in surprising places, so don't try to second guess and put in a speed hack until you've proven that's where the bottleneck is.

Rule 2. Measure. Don't tune for speed until you've measured, and even then don't unless one part of the code overwhelms the rest.

https://users.ece.utexas.edu/~adnan/pike.html

3 comments

That holds true in general but when doing numerical calculations on a large amount of data, taking speed into account is necessary. You usually know approximately the time penalty for not doing so can evaluate the extra time spent coding verses time spent waiting for results.

For example if I am writing a toy neutral network with a small dataset I don't care how optimized it is, or if it runs slowly on the CPU.

But when training a large network on a large amount of data it is well worth spending extra effort from the start to ensure as much work as possible is done on a GPU and writing it to ensure if can support multiple GPUs.

That's some pretty generic premature optimization cargo culting.

If you have a huge data set and some understanding what you're doing, the bottlebecks will be pretty obvious.

Apparently not obvious enough for people to estimate in advance if they should avoid looping.

Pike point is not just to avoid premature optimization. It’s to measure bottlenecks. Because due to changing language and hardware developments, what you think you knew to be true might become outdated.

One of the reasons why Mark Godbolt created compiler explorer was to prove teammates that what for them was pretty obvious actually wasn't.
In Julia I do not need the Godbolt compiler explorer. The macros `@code_llvm` and `@code_native` show me the LLVM IR and native code for a function.

  julia> @code_llvm debuginfo=:none 5.0 + 3
  define double @"julia_+_156"(double %0, i64 signext %1) #0 
  {
  top:
    %2 = sitofp i64 %1 to double
    %3 = fadd double %2, %0
    ret double %3
  }

  julia> @code_native debuginfo=:none 5.0 + 3
  ...
    vcvtsi2sd %rdi, %xmm1, %xmm1
    vaddsd %xmm0, %xmm1, %xmm0
    retq
  ...
There's a reason Godbolt was made for C/C++ rather than python/R. In a fast language you need to know what the compiler is doing to know what's slow. In a slow language, the slow part is pretty much always just "code that does anything in the language".
Python is only slow because so far there has been a huge disregard for JIT implementations, versus how other dynamic languages have decided to deal with perfomance issues.
PyPy definitely shows that python could be 5x faster than it is, however this would still be ~10x slower than Julia/C/C++ (and R is roughly 5-10x slower than python now)
Yes, but it will never part of the development workflow of most Python developers, and the ongoing CPython JIT work sponsored by Microsoft will hardly win any JIT performance prices, given its design goals.

In any case, there is a reason why Python has a profiler in the box, as what one thinks and what actually is, isn't the same. Which was my starting point.

Avoid the allure of premature optimization
But embrace the repulsion from belated pessimization. As Len Lattanzi said, it's the leaf of no good.