Hacker News new | ask | show | jobs
by Mairoce 4 hours ago
Frankly the bigger problem is an over reliance among R instructors on the tidyverse, an ever-expanding ecosystem of redundant functions and anti-patterns. They’re teaching new R users that everything can be solved with yet another package import and skipping over teaching them how to use the already powerful and intuitive base packages.
2 comments

I’m not saying it doesn’t have flaws, but the tidyverse is still the most coherent and functional ML/stat computing ecosystem I’ve ever used. R packages outside of the tidyverse can get pretty gnarly. Even the R stdlib is usually considered to be inconsistent and riddled with legacy cruft.
I would be in minority. But, I don't like tidyverse ecosystem. I prefer data.table for most of my uses.
It's certainly quite pleasant to work with...but I would rather use sql for etl, the backend be whatever it needs to be...

The real world data transformations can get gnarly very quickly and sql is the perfect common debiminator compared to dplyr which is still niche...

How do you feel about polars?

I’m a big fan of Polars. It’s really fast and memory efficient. With the lazy streaming functionality, I’ve been able to easily process 1 Tb+ data on a single machine (you do have to be careful to not do any operation that would cause the whole DF to materialize in that case).

It’s certainly miles better than Pandas, which has a terrible API in addition to being comically inefficient. In my group, we generally use it for any new work, and have also swapped out pandas for polars in critical spots of our existing code - the latter giving a huge benefit relative to the amount of work it took.

I largely agree with you on SQL being the common denominator, but there are some things that are just awkward in SQL, and much easier to do in Python or other general purpose language.

I couldn’t disagree more. The base packages are a complete mess. If R was subset to only the tidyverse 5 years ago then it wouldn’t have lost so much ground to Python in nearly all fields.

Posit is obviously the only organization with the pull to do that, and I feel like they got pulled in 10 directions during the move to AI and trying to also support Python. R Shiny is dead too which sucks because reflex.dev just copied them and ate their lunch in 3 months.

The proof is in the pudding. Every single grad student of mine that was brought up on the tidyverse produces gigantic R markdown files with 20 imports to accomplish something that would be shorter and much much easier to understand (and review!) with a base package or with one of a small number of packages (box, data.table) designed by people who understand programming.

Not to mention the ridiculous styling/formatting of most tidyverse users, which Wickham and others seem to promote. One of the reasons R has lost ground to other languages recently is that most R code these days is ugly

That was always my struggle w tidyverse vs base mastery. From the looney tunes cartoon of the road runner vs the coyote, the coyote used tidyverse and the road runner used base R.
> The proof is in the pudding. Every single grad student of mine that was brought up on the tidyverse produces gigantic R markdown files with 20 imports to accomplish something that would be shorter and much much easier to understand (and review!) with a base package or with one of a small number of packages (box, data.table) designed by people who understand programming.

The fact that young people are producing sub-optimal code (in terms of whatever optimization criteria you are choosing--here, it sounds like terseness) is not strong evidence that a particular software ecosystem (tidyverse) is flawed. Young people producing bad code is not surprising. They're your grad students, mentor them, and maybe they'll adapt to your ways of thinking. Or not.

> One of the reasons R has lost ground to other languages recently is that most R code these days is ugly

Citation needed, surely. The fact that this article is about an increase in the number of CRAN submissions and pseudo-quantitative indices like the TIOBE index show R's slice of the pie is growing provides evidence to the contrary.

> Young people producing bad code is not surprising. They're your grad students, mentor them, and maybe they'll adapt to your ways of thinking. Or not.

You’re right, mentorship is key and I do my best to suggest better practices. They are often quite happy to find out they can do more with less and can forget having to remember multiple additional syntaxes (looking at you “ggplot2”).

I somewhat understand why R instructors lean towards the tidyverse - Wickham’s group produces a ton of tutorials and workbooks, so it’s easy to just point students there - but it has led to entire cohorts of people producing poor code

Data.table is a masterclass in bad API design. Its lack of success despite its technical merits is entirely of their own doing.
Python is just such a good Swiss army knife and it's never a waste to learn: you can do data science and you can do almost anything else. It's the BASIC of the 21st century.