Hacker News new | ask | show | jobs
by martinsmit 1318 days ago
As a relatively new programmer who entered it through statistics, I've still yet to have a better UX or "it just works" moment than using the tidyverse. Years after moving to Julia, Python, and Rust I still go back to R to do any tabular data work. Speed isn't an issue, I always have data.table, and I'm productive in a way that I could only hope to be while doing non tabular data tasks.

RStudio is the perfect IDE. REPL/command-line + Scripts + Plots. I could not be happier using it and I wish I could get VSCode to be half as good. Julia for VSCode is pretty good, but the Python science tooling goes 100% towards notebook environments which I'm not a huge fan of so the Python Science VScode experience is subpar.

4 comments

R is great, and so are some of the packages that lead to the tidyverse, but I think the latter was a bit too much. Re-inventing what already worked with new packages, always overloading R syntax in weird ways (looking at you, ggplot2). I've actually found myself moving back to base R for many of the more basic manipulation tasks.
Base R is loved only by those who were unlucky enough to spend years using it when there was no alternative.
Base R is far from perfect, but for many basic manipulation tasks it works just as fine as tidyverse. Maybe not with piping, but that doesn't really save anything if you format it readably.

There's something to be said about code that just works out of the box. I don't see the need to maximize dependence on third-party libraries as long as the gains are purely "ergonomic". Especially when the creators have a somewhat mixed record regarding long-term commitment vs re-inventing their own wheel.

The real selling point of R imho aren't the data science tools anyway - for that we already have the amazing Python ecosystem (which also the RStudio guys have tacitly admitted with their rebranding) - but the pure statistics packages. Especially if you need something more niche, to the point that you'd use any language just to get an implementation of a specific model, you'll find yourself coming back to R more than half the time. It's simply the language of choice where most statisticians publish their code.

R has some superior data science tools. For example, the tabular data packages dplyr and data.table have no adequate parallels in the Python world. There are many also-rans but no real rivals.
Or anyone who has tried to re-run R code that was written more 6 months ago.
I write R full-time using the full suite of tidyverse packages and that's just not an issue these days. Maybe a few years ago.

And anyway, you'll hit the same issue using third-party packages in any language.

Data.table has got to be the most underrated library ever. If the problem is not big data, but other libraries struggle, fire up data.table and marvel at the sheer speed of it all. On rare tasks when everything was too slow, I'd drop down to c++. Data table essentially made sure that was pretty much never required.
My workflow for the past year has been develop/analyze in Rstudio and port to Python when ready to deploy to production. Notebooks and VSCode still feel cumbersome to me and not designed as an analytics-first solution.

The time needed to re-write a script in another language, and often using different packages, seems more than made up for by the ease of use of Rstudio.

I honestly don't understand why we all went this route of "port to Python". I mean we do it too (not my choice) but it really makes no sense to me.
Because software engineers hate R, and nobody really hates Python that much.
I've found the Jupiter lab IDE to be ideal for my DS and EDA workflows, which typically involves having several notebooks, scripts, and terminal windows open in the IDE. I switched from preferring R to python because having a global shared state across everything in RStudio kept switching the working directory or loading a different file than expected, and I just had very little confidence that things I wrote in R would be reproducible a few years on (which appears to have been a correct concern [0]).

[0] https://datacolada.org/100

You can have multiple RStudio instances open for different projects with different states, and you can use renv to manage package versions reproducibly. None of this is different or worse than python, which is well-known to have its own environment difficulties.