Hacker News new | ask | show | jobs
by fantispug 2556 days ago
This is missing the most important difference - deployability. R was built as a language to use interactively and does things like raise warnings for things that should be errors, requires an external package (packrat) for reproducible package management, and in general is foreign to most developers running operations. Python has good error handling, scripting and logging out of the box and managable package management, and is familiar to most developers and operations. Python has much better libraries for building general purpose tools (but fewer libraries for complex statistics).

I disagree with the "learning curve"; if you've learned other programming languages Python has a pretty simple and familiar core, and Pandas (while the API is an inconsistent mess) is well documented. Base R is quirky compared with modern programming languages, and the API is pretty inconsistent.

I also strongly disagree with the Tidyverse bashing. I'd say it has the shortest learning curve (especially for someone familiar with SQL), and is one of the main reasons I still use R today outside of deep learning - I find it much more friendly to work with than any alternative.

4 comments

> This is missing the most important difference - deployability.

I've deployed both R and Python for completely junior datascientists team, on top of a poorly managed infrastructure. I'd say they both have pros and cons and are actually both pretty bad. But R's packrat makes it slightly better than python. Python is a mess when you want to reproduce a working environment. Conda and pip both have huge issues. R's package management is pretty poor too with completely misleading errors, but at least it's unique and once you know your way around the most common errors you can build and run different projects quite consistently.

I've managed both RStudio+Shiny for R and Jupyter for python and overall my experience is better with the R stuff too. Things look a bit standardized while Jupyter needs tons of dependancies and (I felt) lacks a clear opinionated way of doing things.

I have 0 opinion on the actual languages though, as I'm not a developer.

I have found that deploying and maintaining RStudio Server has been an absolute breeze, whereas JupyterHub (we use the systemd spawner) is kind of a pain. That said, my worst nightmares are the crossovers _between_ R and Python - getting R code that interacts with Reticulate to work and perform well, especially with all the MKL threading options, has taken so much effort and compromise I'd almost ban one language or the other and live with an unhappy team of data scientists.
I have to agree - deploying Python code is horrible, unless the environment you're deploying to is super tightly locked down.

I had a colleague try to set me up with their R-studio project recently and we gave up because getting the packages installed was such a mess. So I'm not currently a huge fan of either. I don't do much data science or machine learning, but I work with and support people who do.

At least in my experience it has been pretty simple to deploy Python software.
Pretty simple is relative. I deploy python applications to cloud instances using docker through a git push based ci/cd setup. It works great, and I think it’s simple. But if I have to explain to an analyst how to use 3 different platforms and 5 or so tools to replicate what he currently gets by clicking “publish to RStudio connector” in the top right of his code, it seems obvious that’s not even close to being comparable.
> managable package management

Your joking right? Eggs, wheels, virtualenv, venv, pyenv, poetry, conda, python2.7, 3.4, 3.5, 3.6 (yes, we have all 4 versions installed in my current company's "production environment", not to mention 3 different versions of python 2.7 but no 3.7)...

My experience is pip and virtualenv (or venv in 3.3+) works pretty well in both interactive and deployment contexts. My main gripe is Pip doesn't resolve conflicts between downstream dependencies (which I could only get a stable environment from by mapping dependencies and aggressive version locking). Conda has some advantages for certain use cases (handles libraries with installation dependencies well, curated repository), but I've never felt a need to use poetry or pyenv. Why do you have so many versions of python in production? I'd be surprised if now there are many packages that only work on specific versions (especially on 2.7 point releases).

I don't have much experience with packrat - but as opposed to pip it's another thing you need to discover and install. And so people don't do it by default when releasing code, and I've had to bisect versions of dependencies to get a working version of code. This can happen in Python too, but is rarer.

I got put right off python years ago when I ran into egg dependency hell, I'm glad it's not just me.
Yea, even as someone who's been developing in python since 1.5, dealing with dependencies and installing packages has always been my least favorite part of the language.

Fortunately conda came along and pretty much solved all my problems.

It is a mess but I always find myself using Python when I want something done quickly...
Deployability is more complex with R indeed but at the same time it is far from the pyenv or virtualenv complexity associated with Python. Both have their quirks and I dont think Python has peaked in how it deals with dependencies and reproducible environments.
R is also quite controversial because of its GPL license. I know that there are ways to overcome this issue but most decision makers do not want to risk when they see that a product has such a license.
That's strange, is it ever relevant what the license of the language is? It hasn't stopped Linux.
It is not about the language - it is about the interpreter (implementation).

There are many aspects here:

* You simply use it (in interactive mode)

* You integrate it into your application as static lib

* You integrate it into your application as a dynamically linked lib

* You use it via some kind of (remote) API

* Do you integrate via source code, static lib, dynamic lib, API