Hacker News new | ask | show | jobs
by jasonpbecker 3863 days ago
I couldn't disagree more. R is great at munging pretty much everything but unstructured textual data. The tools are definitely behind Python if you're dealing with literal written documents.

I don't know anyone who considers themselves a "data scientist" of any sort that doesn't view their job as 80% or more data wrangling/munging/cleaning.

I write production ETL processes in R at my current job. AMA.

1 comments

May I ask what tools you favor in the R environment? I just haven't found anything as performant for operations on irregular and poorly formatted time series as the pandas library, and in fact I just finished an ETL in pandas for my current job.

I'm always interested in learning a new tool, though.

I don't work much with data that would benefit from being very tight about datetimes as a dimension. I'd have to know a bit more about what was challenging before I could confidently recommend for your particularly case. My email is on my profile and I'd be happy to chat there if it's something that would be helpful.

I have largely avoided ts, zoo, etc where possible. Time series stuff seems to have a lot of specialized tooling all of which tends to be much more strict about data structure than I'm comfortable with for my flow.