Hacker News new | ask | show | jobs
by bionsystem 2556 days ago
> This is missing the most important difference - deployability.

I've deployed both R and Python for completely junior datascientists team, on top of a poorly managed infrastructure. I'd say they both have pros and cons and are actually both pretty bad. But R's packrat makes it slightly better than python. Python is a mess when you want to reproduce a working environment. Conda and pip both have huge issues. R's package management is pretty poor too with completely misleading errors, but at least it's unique and once you know your way around the most common errors you can build and run different projects quite consistently.

I've managed both RStudio+Shiny for R and Jupyter for python and overall my experience is better with the R stuff too. Things look a bit standardized while Jupyter needs tons of dependancies and (I felt) lacks a clear opinionated way of doing things.

I have 0 opinion on the actual languages though, as I'm not a developer.

3 comments

I have found that deploying and maintaining RStudio Server has been an absolute breeze, whereas JupyterHub (we use the systemd spawner) is kind of a pain. That said, my worst nightmares are the crossovers _between_ R and Python - getting R code that interacts with Reticulate to work and perform well, especially with all the MKL threading options, has taken so much effort and compromise I'd almost ban one language or the other and live with an unhappy team of data scientists.
I have to agree - deploying Python code is horrible, unless the environment you're deploying to is super tightly locked down.

I had a colleague try to set me up with their R-studio project recently and we gave up because getting the packages installed was such a mess. So I'm not currently a huge fan of either. I don't do much data science or machine learning, but I work with and support people who do.

At least in my experience it has been pretty simple to deploy Python software.
Pretty simple is relative. I deploy python applications to cloud instances using docker through a git push based ci/cd setup. It works great, and I think it’s simple. But if I have to explain to an analyst how to use 3 different platforms and 5 or so tools to replicate what he currently gets by clicking “publish to RStudio connector” in the top right of his code, it seems obvious that’s not even close to being comparable.