Hacker News new | ask | show | jobs
by minimaxir 2610 days ago
I use both Python and R. tidyverse/ggplot2 alone are enough reason to use R, and are substantially faster for tasks that utilize those packages than the equivalent in Python (in my opinion).

Although I haven't had as much reason to use base R. For more ML-related tasks I do go back to Python.

4 comments

Here here. Tidyverse also provides a centralised 'this is how you do X' nexus really helps discover-ability. World class stuff, on tap.

For example, I know the recommended pipe in R is magrittr's %>%. I have no idea what the respectable pipe library in Python is, or even if there is one.

I wouldn't even know where to start finding all the tidyverse equivalents in Python. It isn't as organised and obvious as the R statistics community.

On the other hand Base R is the worst. Disgusting language.

Julia has a |> operator and it works amazingly well with Queryverse.jl which is a clone of Tidyverse!
This. I’ve contributed code to popular libraries in both languages, and while I (overall) have a preference for python (mostly due to it being general purpose), I find R code unparalleled when it comes to raw data manipulation/analysis.

The overall api of tidyverse packages is such a joy, and recent improvements in purrr/tidyr allow me to construct nested data analysis workflows I couldn’t even dream of in python.

One random example I found recently is a tidyverse package called forcats that has lots of nice functions for categorical data. For example, it has a single function that merges all categories with a frequency of less than a certain threshold in the table into a new category like "other" or whatever. This is a task I often need to do, but as far as I can see it's a bit of a hack in python or pandas. It's just lots of little things like this, especially wrangling data tables.

https://forcats.tidyverse.org/reference/fct_lump.html

There's also the data.table package for this kind of data work, which is maybe less used but seems to have better performance.

Would you have an example of that?
Seconded on all points. I do branch out to SQL for stuff, too, and I find that R and Python play nicely with it, too. But as long as ggplot exists and Python doesn't have it, R will never really leave my side.
I'm finding a fairly nice combo is using rmarkdown, python and reticulate to do the things that are easier in python there and the outputs in R. Debugging isn't where I'd like it to be yet but there might be a way of improving that - I haven't explored yet.