Hacker News new | ask | show | jobs
by czep 3113 days ago
> Python with sklearn and pandas blows R out of the water.

For some things yes, but for others the reverse is true. I'm also a heavy R and python user and find the two ecosystems extremely complementary. For building pipelines and web apps, python has an edge. For statistics, graphics, and data management, R is IMO superior. You can do everything in either language, but have to jump through hoops in some cases. Sometimes the best solution is use both!

For example, I run an internal web app for A/B testing using django and rpy2. Doing it all in python would have been sub-optimal because dataset management is so much simpler in R. Plots that were easy to do in ggplot2 were impossible to get right in matplotlib. The big drawback to this method is R's single-threaded architecture. Embedding R in a web server process is not easy (ask me!), and won't scale as well as a multi-threaded environment can.

All my data exploration and prototyping happens in R. Even basic report scripting can be done better in R than python because of the ease of data management. Consider a typical case of 1) run database query, 2) munge data around to produce a table, and 3) email or save to html. If you can't get exactly what you want from the database in one query and you have to do a lot of munging in step 2, then R is going to be more flexible than python. If I need to merge, aggregate, or recode variables, I would much rather use R. Doing all this with a list of lists "dataset" in python is convoluted at best, and recreating a lot of the functionality that base R gives you.

1 comments

Do you not use pandas?