| HN Mirror

Do you think the love it/hate it dichotomy over R for data 'munging' stems from different ways of thinking about data. I'm slowly getting comfortable in R since returning to work in a sort of freelance arrangement that makes me highly motivated to use free or affordable tools. I started out, however, in clinical epidemiology data analysis using MS Access and SAS. I still think of data in terms of rectangular data sets, RDBMS and sql. I have a hard time with vector and matrix related terminology. I think I'm going to end up using reshape2 and data.table a lot since sqldf is noticeably slower even with my small data sets (compared with web analytics, finance, etc). The problem with sqldf and variable names containing a dot is a real drag as I try to adopt good coding style. I am missing the clarity and familiarity of sql statements, though, as I try to find my new workflow in R. I hope a more unified approach to data munging emerges soon. BTW, I totally espouse the reproducible research (RR) method of documenting study design, analysis, interpretation... I am loving knitr and latex for RR so I can no longer imagine using different tools for data munging and analysis.