Hacker News new | ask | show | jobs
by _Wintermute 1556 days ago
> Before R commercial statistical packages were mainly used.

Maybe in your field, I work in bioinformatics - before R, perl was widely used as a high-level language.

> Regarding keeping versions straight, all past versions of packages in the CRAN repository are kept on CRAN...

This is woefully inadequate if you need to replicate somebody else's environment. Nobody should think manually guessing and then typing in each package version and hoping they're compatible is a viable option. Not to mention even if you specify an older version of a package it doesn't pull in compatible dependencies, it just pulls in the latest version. There's renv but it's not reached widespread use.

> Regarding tidyverse dependencies you can reduce the number of packages you load by not using library(tidyverse) and instead load the specific packages you need. This will result in fewer packages being loaded

We're talking about replicating other people's work. We don't have any control over their code, and R users are largely ignorant of best-software practices.

2 comments

Totally agree. I find it frustrating trying to reproduce other people's work in R. How has this situation has been allowed to continue for so long? It's unacceptable, especially when used for science. It's impossible to replicate anything unless you are lucky enough you manage to find which package version introduces breaking changes and even then this is something you have to do repeatedly for every code break. Even with _renv_ it's a library you have to install within your R environment which is pointless. Where is a dependency solver like conda for R? - Not that it's perfect, but I've been happy with its drop-in replacement - mamba recently.
The packages that were used in statistics were SAS, SPSS and Stata. perl is not a statistical package and has nowhere near the depth of statistical capabilities of R.

Don't forget that I also mentioned the checkpoint package in my post. You only need to know the date for that, not the version of each of the packages.

In your last paragraph I think you are referring more to software development practices than what is available through R. Simply using R or any language doesn't guarantee this.

That's a very roundabout way to solve an actual problem. In many cases you don't pin your package version to _latest_ (whatever that date is) and you need a more fine-grained solution to keeping package versions. I don't think that solves this and I don't know if you can do it with checkpoint.
Of course it is possible to screw up but if you don't update your packages and record the date that does not seem to be R's fault.