Hacker News new | ask | show | jobs
by vegabook 4223 days ago
yes I have moved (back) to Python mainly because R is too slow when we get beyond a certain data size and the language is not powerful enough when data starts having to be moved around at scale. I have a 5-10 times speed improvement in native Python and another 30x more if I can vectorize things in Numpy. However a huge caveat is that R is much more succinct when it comes to exploratory analysis during what I call the "data rotation" phase because its vectorized nature is so much more efficient at selecting, reducing, cleaning and rotating data, than even Pandas can manage. It's irritating having to write list comprehensions constantly for what would often have been a ridiculously direct and efficient vectorized command in R. Moreover R's graphics leave matplotlib in the dust, though this advantage is eroding with the JS libraries taking over.

The other area where Python crushes R is if your data is live streaming. Here you inevitably need a full fledged programming language with proper asynchronous io capabilities and multithreading / multiprocessing that is not batch oriented.

1 comments

Can you give an example of the a somewhat complicated vectorized command in R that would require lots of list comps in Python?