Hacker News new | ask | show | jobs
by jwilbs 2614 days ago
This. I’ve contributed code to popular libraries in both languages, and while I (overall) have a preference for python (mostly due to it being general purpose), I find R code unparalleled when it comes to raw data manipulation/analysis.

The overall api of tidyverse packages is such a joy, and recent improvements in purrr/tidyr allow me to construct nested data analysis workflows I couldn’t even dream of in python.

2 comments

One random example I found recently is a tidyverse package called forcats that has lots of nice functions for categorical data. For example, it has a single function that merges all categories with a frequency of less than a certain threshold in the table into a new category like "other" or whatever. This is a task I often need to do, but as far as I can see it's a bit of a hack in python or pandas. It's just lots of little things like this, especially wrangling data tables.

https://forcats.tidyverse.org/reference/fct_lump.html

There's also the data.table package for this kind of data work, which is maybe less used but seems to have better performance.

Would you have an example of that?