Hacker News new | ask | show | jobs
by dongobread 724 days ago
I'm not sure what would lead to you believe this. I've worked in the data science/ML space for over a decade now and I see the majority of pure analytics projects started in R, including at big tech companies I've worked at recently.

Of course, ML projects and other things that need to result in production-grade models are almost always done in Python. This is currently the most visible form of "data project" due to all the ML/AI hype, but it is far from the only data work going on.

2 comments

Im curious about the people who use R in big tech companies that you've worked at. Were the R users the people who had just come out of school and still working using their academic dev environment before weening off?

I always found that was the group who used R - kind of a use what you are used to until it gets out of step with the remaining workflow.

I also would say that the amount of R I see is far less than python.

So, (speaking as someone who started with R and now predominantly writes Python), I think there's a bunch of things going on here.

1. R is 100% better for analytics work and statistical modelling. There's just no contest.

2. Python is much, much better for data getting (APIs/scraping etc) and dealing with non table-like data. Again, there's basically no contest here.

3. Software engineers hate R (in most cases), which means that it's easier to hand over work for production in Python.

This leads to a situation where it looks like most of the prod-level work is being done in Python, but if you look under the covers you'll discover that most prototyping/analysis/exploration is done in R and then ported to Python if it works.

Like, Python is a great language for lots of things, but it's pretty terrible for exploratory DS work (pandas is like the worst features of base R and base Python mashed together in an unholy hybrid).

There's also the fact that all the NN stuff is predominantly Python, so lots of companies believe that they need Python people, which reinforces the stereotype.

And finally, while I love R, Python has more guardrails, and it's harder to make an unmaintainable mess with it (relative to R). Particularly when people use all the various lazy evaluation packages that the tidyverse has used over the past decade (I once maintained a codebase that used all of these in different places, it was not a fun experience).

One of the better comments in this thread, I would only qualify that different levels of ability mediate much of the "how hard is it to make an unmaintainable mess" dimension. Dplyr/tidy code can be pasta, as can pandas, and there is really a whole new level of that given llm generated nonesense edited/tweaked by novices masquerading as seniors.

Apropos this idea of a vs code competitor, I wish they would spend more effort on existing products. I find quarto frustratingly buggy and meanwhile see no reason to move my workflow from vscode to this new thing. Ymmv

> I would only qualify that different levels of ability mediate much of the "how hard is it to make an unmaintainable mess" dimension

Oh definitely, but at least Python's stdlib is relatively consistent, which helps packages be a little more so.

My favourite example is t.test, which is not a t method for the test class, unlike summary.lm which is.

And there's like 4 different styles of function naming in base & stats alone.

Python has problems (for gods sake, why isn't len a method?) but it's a little more consistent.

I used to think that R was responsible for a lot more of the mess than I now do, having seen the same kind of DS code (and I am a DS) written in both Python and R.

And it would be sweet if R had a pytest equivalent, if I never have to write self.assertEqual again, it'll be too soon.

Youre wrong. Python is outpacing R in usage. Every metric you can find proves it. R also has fundamental issues and lacks serious development.
Not to dispute because I have no idea so I'll assume you're correct. But how many metrics did you find and how were they obtained? And how would you know they are representative of all R users?
For whatever it is worth, the TIOBE index lists Python as #1, R at #21.

Python is the first language many people are exposed to today. It has a library and tooling for every use case.

https://www.tiobe.com/tiobe-index/

R has a pretty particular use case though, Python use for statistical programming/data analysis would be an apples to apples comparison. People doing a coding 101 course in Python don't really count against the R user base.
No one is disputing that R has usage in niche arenas.
s/serious/hyped
No. R fundamentally has not really improved in the past ~10 years. Do you know much about how R works?

Also try:

gsub('serious', 'hyped', x)

Maybe because it already does what it intends to do reasonably well? I mean, what do you think needs to be improved?
Here are 14 years of HN discussions/criticisms of R: https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...

"Does what it intends to do reasonably well" is going to be widely subjective, depending on whether the user's use-case is statistical/life-sciences vs more general purpose coding and relying on many packages; prototyping/experimentation vs production code; whether the user uses base-R, or tidyverse/data.table, etc.

Here are two of those many posts:

* An opinionated view of the Tidyverse “dialect” of the R language (July 5, 2019) https://news.ycombinator.com/item?id=20362626

* The R programming language: The good, the bad, and the ugly (epatters.org, 2018) https://news.ycombinator.com/item?id=35571659 -> https://www.epatters.org/post/r-lang/

If youre unironically asserting that R already does everything well enough Im not going to take you seriously.
Yet you haven't provided any substantial points against it and assume that others will take you seriously...
I'm asking what needs to be improved, in your opinion.

That's a normal follow-up question that you should be able to answer. Otherwise, why are you even commenting?

The issue is that even if you peel the hype (which is a fact), python is still far larger.

If you check e.g. the journal of open source software (which does not have much ML/AI bias), most of the papers are python, with an occasional R and julia submission.