| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by wenc 2786 days ago

Echoing other comments, Tidyverse is somewhat more coherent (aided significantly by magrittr's %>% operator). Beginners might get tripped up by Non-Standard Evaluation (NSE), which is a little unintuitive, but there are packages to help with that.

The Pandas's API is a generalized solution to complicated, variegated use cases and its syntax reflects that (it was also hemmed by strictures of Python). There are several indexing methods, several ways to slice, several ways to do apply's, all of which behave slightly differently. Even expert Pandas users have trouble remembering the syntax for all of these, so they typically have a Pandas API browser window open or a printed cheat sheet pasted on some corkboard. Pandas definitely takes longer to get used to than Tidyverse but the payoff is that you get to use Python, which is a somewhat "deeper" language than R.

R is great for interactive work, and for data munging jobs that don't interact too much with non-R libraries. However Python is sinply more versatile end-to-end.

I used to start my interactive analysis in R and port to Python for production, but these days I start in Python straight away so there's no impedance mismatch. I've personally found that writing production code in Python (and by extension Pandas) to be much more pleasant than in R, even with Tidyverse.