Hacker News new | ask | show | jobs
by dan_yall 4937 days ago
I've found Python + Pandas much better in this regard than R. Maybe it's just me, but for grouping, indexing, and manipulating tabular data, Python syntax just makes more sense.

That said, R is better for stats and matrix operations.

2 comments

Are you using Pandas? If so, your comment would be ironic because pandas borrows heavily from R ;)
They might have borrowed from R. Wes McKinney admits to being influenced by R especialy data frames...but it makes data analysis all the more easier when i can do everything i want within the the Python environment. pandas is proving to be a bit of a longer learning curve i must admit, but then the python environment and native Matplotlib support made life oh so much simpler.

That's just me though.

What has pandas borrowed from R, other than a 2D data structure with heterogeneously-typed columns?

I guess the data frame merge invocations are similar.

(I know patsy/statsmodels are introducing R's formula syntax to python, but that's not pandas.)

The split-apply-combine framework dealing with group by tasks (http://www.jstatsoft.org/v40/i01/paper, not that there aren't other precedents) for one. But generally, Wes has used R to figure out what people want to do, and then ported an elegant interface to python.
I would agree with you if it wasn't for the data.table package in R. It is a game changer. Really.
Can you elaborate on data.table being 'a game changer'. I am inclined to agree, but I'm am just starting to get a handle on it. I am still hesitant and switching between sqldf, reshape2, base::merge and data.table more than I would like. Do you think it could become a dominant method for data preparation?
Python has PyTables which complements Pandas nicely and seems to offer the same sort of features as data.table (note, I've not actually used data.table)
I am using R to analyse and document (knitr and latex) epidemiologic data which does not involve parsing a lot of text to extract my analysis data set. Data preparation for this type of research involves more combining data from different source tables, restructuring repeated measures, etc. I only know how to do that using R. Can Python be incorporated into the knitr literate programming framework and is it worth learning another language?
Python will be better supported in knitr in future; for now it only has preliminary support: http://yihui.name/knitr/demo/engines/