Hacker News new | ask | show | jobs
by professionalguy 1815 days ago
That’s cool and everything, but I don’t know many DS people who still use R. Maybe academics still do?
9 comments

R is overwhelmingly used in bioinformatics. There is nothing quite like bioconductor. Most new tools/methods (for ex, in the scRNA-seq) release R packages first.
Well I'd say conda is quite like bioconductor with the ease of installing relevant packages. scRNAseq has popular r packages like seurat but also popular python packages like scanpy.
I didn't understand your comment, which is probably my fault.

But you can absolutely install many bioconductor packages from conda.

I love using conda as my environment manager rather than compiling and installing 1000p different libraries and tools.

Also, I install mamba for drastically faster resolution of the dependencies.

Really depends on the application. For clean, concise and reproducible ad hoc statistical analytics and modelling, there isn't a better tool than tidyverse+tidymodels.

It's a classic case of the best tool for the job. I usually create simple stuff in R and then move to bigger datasets and production in py+spark.

tidyverse may be clean, but it is nowhere near as concise as data.table.

data.table is also typically orders of magnitude faster.

Thanks for the point of view, I can't argue since I don't really know data.table. Will check out!
Check out dtplyr- lazy data.table backend and tidyverse syntax
Although I agree and don't like R very much, I believe ggplot is still the gold standard for creating top-quality visualizations. None of the python (or other language) clones are quite as good. For projects where the end goal is a complex or detailed graph or plot, it's sometimes worth trudging through R to achieve the best final result.
Yup, not a data scientist but often do data processing to analyze the outcome of experiments (drone-flight related). I'll use Python/Jupyter if there's a significant amount of clean-up that needs to happen, but R/ggplot is unbeatable if I'm trying to look at the data from different perspectives. As an example, I was trying to look at GPS data the other day and ggplot() + geom_point() + geom_density_2d() was an absolutely perfect way to better grok what was going on.
I've used Python since 1995, so I should be biased, but switching from Python to R is a huge productivity boost - like switching from Excel to Python. R is just years ahead.
R sees significant use both in academic/research settings and industry.
It's the dominant language among academic statisticians.
There are a lot of DS folks using it for Bayesian Statistics
It’s sadly still quite popular in the research world
What about it makes you sad?
Not the original poster, but the language has some really weird edges. For instance, check out this wild behavior: http://www.hep.by/gnu/r-patched/r-lang/R-lang_41.html
This is because R borrows a lot of syntax from S. When R came out, statisticians were using S, so it was natural to make it like this. If they went another way, you'd get statisticians in mailing lists 20 years ago bemoaning how its so much not like familiar S, rather than regular old programmers 20 years later today who bemoan that R isn't like familiar python like what happens on HN whenever there is an R thread.
I think the behavior is so wildly inconsistent that it's not really justifiable, regardless of being a statistician or not: https://github.com/tidyverse/design/issues/13#issuecomment-4...
I mean compared to other languages these sorts of quirks might seem like big deals, but they rarely come up. You see that error, you copy paste and find a stack overflow thread explaining it, you know what to do next time and move on. R is certainly no C.
As far as weird edges go, that one is really, really mild. It may even be considered a good idea!

For people interested in weirder things, check The R Inferno (I think it's somewhat outdated by now, though):

https://www.burns-stat.com/documents/books/the-r-inferno/

That book isn't so much about R weirdness. It's more about teaching data scientists to consider the implications of practices like copying a huge table in memory on every loop iteration.
Idiosyncrasies are not something unique to R.

One could express the same surprise at an empty list being considered false in some contexts.

R absolutely dominates some of the life sciences. For example, most of the state-of-the-art bioinformatics tools are in R.