Hacker News new | ask | show | jobs
by TheLogothete 3778 days ago
In my experience a lot of the claims that R is slow are greatly exaggerated and made by people who don't actually use it. Kind of an echo chamber. Every time I see someone say chose Python instead because of speed, I roll my eyes.
3 comments

My usual take on this: "Between R and Python, the faster language is likely whichever library author actually wrote most of their code in C or FORTRAN."
Pypy is actually much faster than both standard Python 2.7 and R in basically everything requiring the standard library.
It's funny you bring up python; I say this not as a comment on your thesis, but related, since I often hear the "python is slow" trope but that's only half true, you can typically write python that is plenty "fast enough" (As a day-jobbing data pipeline engineer) if you're implementing with an understanding of what things will drive you into the mud. This goes beyond just understanding the tool you're using, fundamentally writing an O(N^N) or something is going to hurt even if you're in C#/C. Have seen that plenty, frankly, more than I've seen "legitimately slow python"

Anyway, this was just a thought rolling around in my head given the discussion.

I've translated plenty of numerical code from (pure-ish) python to c and c++, and usually get about a 100x speedup, sometimes as high as 800x, implementing the same algorithms.
At the risk of being overly pragmatic, note that I said "fast enough" and not "as fast as possible."

My comment was more on the perception that python is unworkably slow in many situations, where I can count the number of times on my hands that I've NEEDED to C-ify some hot paths.

If you're writing a plasma fluid simulation to run on a HHPC cluster, yes, you probably damn well want some straight C/C++. Outside of similarly exceedingly high throughput situations, CPUs are normally more than fast enough, especially if the application in any way brushes up against people and thus falls into "human time" scales, in which case you'd typically be hard pressed to make things slow enough for someone to notice. (Yet somehow we find a way...)

To a sister post re: where python->C speedup can occur, to two birds with one stone, I imagine there's a lot of low hanging fruit, to take one obvious one, anything the compiler can optimize away. Memory read/address optimization, vectorization, potentially better support for branch prediction, I can handwave at more but I am so far from a compilers type that I'd probably make a fool of myself.

> Outside of similarly exceedingly high throughput situations, CPUs are normally more than fast enough, especially if the application in any way brushes up against people and thus falls into "human time" scales, in which case you'd typically be hard pressed to make things slow enough for someone to notice.

This has simply not been my experience. (In a previous job I had reasonably optimized numerical python code sitting on the back end of an api and it was incredibly easy to go over our time budget).

For what it's worth, I believe you; I'd be curious what the workload was / what the time window was, if you're able to say?

I could certainly see myself as having been spoiled with respect to beefy hardware and feasible workload/SLA ratios, but it's lead me to a prior where I take the age old advice against premature optimization pretty aggressively. (Starting projects in python, naive brute force implementations for a first pass, readability over a better O(N), etc)

Nit, but throughput is not the only performance constraint that could rule out Python. The last substantial amount of C I wrote was low throughput but needed to reliably receive, process and respond to packets in single-digit microseconds.
I've had good experience with Cython, which compiles python to C and gets almost all of the speedup of rewriting in C entirely. And in fact, most of that speedup just comes from declaring variable types...
Any idea where the speedups came from? Is it that the problems weren't algorithmically limited in the first place (lots of io for example), reduction of overhead etc (what kind of python was the code running on before?), or just that the speedup on low level operations added up cumulatively and cam e to dominate the other timing factors?

Also, did you change the data structures or use the same ones as in python? Was any of the speed boost data structure related?

Python and similar dynamic languages suffer from the fact that every name access (variable, function, etc) incurs a dynamic lookup of that name in a (nested) dictionary. Statically compiled languages don’t have this. There are fairly recent, clever optimisations that can avoid many of these lookups but (a) they are not implemented in any of the common implementations of Python, R, etc (JavaScript has them though). But even with these optimisations in place we cannot get rid of such lookups altogether, and they kill cache locality and branch prediction.

There are other reasons for slowdown (automatically managed garbage collection is a big one, and so is any kind of indirection, e.g. callbacks). But usually the big one is name lookup.

As a compiler writer, I can tell you that in JS, local variable lookups do not incur any kind of dynamic overhead. The performance of modern JS engines is much closer to C than you might think. Dynamic language optimization is also not so recent. Most of the techniques implemented by modern JS engines were invented for the Smalltalk and Self projects. See this paper from 1991, for example: http://bibliography.selflanguage.org/_static/implementation....

Python is just inexcusably non-optimized. It's a bytecode interpreter, with each instruction requiring dynamic dispatch. Integers are represented using actual objects, with pointer indirection. The most naive, non-optimizing JIT implementation might get you a 10x speedup over CPython. I think that eventually, as better-optimised dynamic languages gain popularity, people will come to accept that there is no excuse for dynamic language implementations to perform this poorly.

I haven’t followed recent development of JavaScript all that closely so my knowledge is somewhat outdated. However, the optimisations that make JS performance close to C in some cases are really recent. Some of the tricks are old, such as the paper you cited. But these tricks only go so far, and in particular even modern GCs simply work badly in memory-constrained environments, which puts a hard upper limit on the amount of memory that JavaScript can handle efficiently. One of the better articles on this subject is [1].

That said, my comment already mentioned that local variable lookup isn’t a problem in JavaScript. It is in R, however; see my example in [2]. Beyond that, both R and Python execution have obvious optimisation potential, which is made hard by the fact that existing libraries rely extensively on implementation details of the current interpreters.

[1] http://sealedabstract.com/rants/why-mobile-web-apps-are-slow...

[2] https://news.ycombinator.com/item?id=11117070

The lookup thing only happens during compilation to byte code or intermediate code, I believe. Once in byte code, there are no variable names, only addresses.
No, unfortunately that is not the case. Lookup happens at execution of the byte code, because variables cannot be looked up at byte compilation. Consider the following case:

    x = 1
    local({
        assign(user_input, 2)
        print(x)
    }, envir = new.env(parent = environment()))
If `user_input` is “x”, the lookup of `x` in the local scope finds a different variable, in a different scope. Hence this lookup needs to take place every time this piece of code is executed.

I’m not sure if Python suffers from similar problems.

The lookup thing only happens during compilation to byte code or intermediate code, I believe. Once in VM, there are no variable names. Only addresses.
All from (3). Definitely not io bound, and using standard python 2.7 (if numpy had been applicable, I would have used it...)

My data structures for numerics are generally really simple, and generally I'm able to go from python list/dict/sets to c++ vector/map/set pretty directly.

I for one write all my statistical code in baremetal assembly. I manage about 5 a year, but they all run very quickly. There is no such thing as premature optimization.