Hacker News new | ask | show | jobs
by _Wintermute 1244 days ago
It's a really sticky misconception. I've seen many beginners telling others to "never ever use loops in R", and so you end up with nested sapply()s or whatever soon-to-be-deprecated tidyverse functions are in vogue that nobody can reason about.
3 comments

Agreed. The most common reason loops become bottlenecks is people "adding onto" vectors or dataframes. This causes a whole new vector to be created, the data from the old one copied into it, and then the new data filled in at the end. You'll rarely notice the performance hit unless you stick it in a loop that runs tens of thousands of times.

For those who want to avoid it and still use a loop, you can create a vector beforehand with the final length and fill it in. If you don't know the final length, create a vector with a good guess for length, double its length whenever it gets full, and then crop off the unused tail when you're done.

So Rob Pike’s rule 1 and 2 again:

Rule 1. You can't tell where a program is going to spend its time. Bottlenecks occur in surprising places, so don't try to second guess and put in a speed hack until you've proven that's where the bottleneck is.

Rule 2. Measure. Don't tune for speed until you've measured, and even then don't unless one part of the code overwhelms the rest.

https://users.ece.utexas.edu/~adnan/pike.html

That holds true in general but when doing numerical calculations on a large amount of data, taking speed into account is necessary. You usually know approximately the time penalty for not doing so can evaluate the extra time spent coding verses time spent waiting for results.

For example if I am writing a toy neutral network with a small dataset I don't care how optimized it is, or if it runs slowly on the CPU.

But when training a large network on a large amount of data it is well worth spending extra effort from the start to ensure as much work as possible is done on a GPU and writing it to ensure if can support multiple GPUs.

That's some pretty generic premature optimization cargo culting.

If you have a huge data set and some understanding what you're doing, the bottlebecks will be pretty obvious.

Apparently not obvious enough for people to estimate in advance if they should avoid looping.

Pike point is not just to avoid premature optimization. It’s to measure bottlenecks. Because due to changing language and hardware developments, what you think you knew to be true might become outdated.

One of the reasons why Mark Godbolt created compiler explorer was to prove teammates that what for them was pretty obvious actually wasn't.
In Julia I do not need the Godbolt compiler explorer. The macros `@code_llvm` and `@code_native` show me the LLVM IR and native code for a function.

  julia> @code_llvm debuginfo=:none 5.0 + 3
  define double @"julia_+_156"(double %0, i64 signext %1) #0 
  {
  top:
    %2 = sitofp i64 %1 to double
    %3 = fadd double %2, %0
    ret double %3
  }

  julia> @code_native debuginfo=:none 5.0 + 3
  ...
    vcvtsi2sd %rdi, %xmm1, %xmm1
    vaddsd %xmm0, %xmm1, %xmm0
    retq
  ...
There's a reason Godbolt was made for C/C++ rather than python/R. In a fast language you need to know what the compiler is doing to know what's slow. In a slow language, the slow part is pretty much always just "code that does anything in the language".
Python is only slow because so far there has been a huge disregard for JIT implementations, versus how other dynamic languages have decided to deal with perfomance issues.
PyPy definitely shows that python could be 5x faster than it is, however this would still be ~10x slower than Julia/C/C++ (and R is roughly 5-10x slower than python now)
Avoid the allure of premature optimization
But embrace the repulsion from belated pessimization. As Len Lattanzi said, it's the leaf of no good.
> so you end up with nested sapply()s

And that's usually not even vectorizing anything, it just hides the for-loop that is buried somewhere in the apply-code...