Hacker News new | ask | show | jobs
by baron_harkonnen 1746 days ago
I used to very strongly agree with you re: matplotlib, but I've recently switched from using almost exclusively ggplot2 to almost exlusively Matplotlib and my realization is that they are very different tools serving very different purposes.

ggplot2 is obviously fantastic and makes beautiful plots, and very easily at that. However it is definitely a "convention over configuration" tool. For 99% of the typical plot you might want to create, ggplot is going to be easier and look nicer.

However matplotlib lib really shines when you want to make very custom plots. If you have a plot in your mind that you want to see on paper, matplotlib will be the better tool for helping you create exactly what you are looking for.

For certain projects I've done, where I want to do a bunch of non-standard visualizations, especially ones that tend to be fairly dense, I prefer matplotlib. For day to day analytics ggplot2 is so much better it's ridiculous. The real issue is that Python doesn't really offer anything in the same league as ggplot2 for "convention over configuration" type plotting.

Fully agree on Pandas. R's native data frame + tidyverse is world's easier. Pandas' overly complex indexing system is a persistent source of annoyance no matter how much I use that library.

1 comments

> Fully agree on Pandas. R's native data frame + tidyverse is world's easier. Pandas' overly complex indexing system is a persistent source of annoyance no matter how much I use that library.

Is it just the syntax/readability that annoys you, or are there actually problems that need like n steps more to do the same with Pandas?

I spend more time working around panda's strange isms than it takes me to write vanilla python that does the same thing. The index problems are not just a small annoyances, and sometimes can waste hours because of its awkward defaults. For example, its default in df.to_csv to write an index (without a column name..)! It doesn't make any sense to me whatsoever that reading a csv, then writing the csv would add a new column. I'm really tired of rerunning pandas code after I forget to turn that stupid default index setting off. Is that a small thing? Sure. But it had tons of small things like that.
It's funny you complain about the index being saved in csv files, which is the default behaviour in R.
Not in tidyverse, but yeah indexes in pandas are a souped up version of rownames in base R.