Hacker News new | ask | show | jobs
by civilized 1719 days ago
R isn't the best for production predictions for sure (it can work though). But it's not hard to translate well-designed R processing pipelines and models into other languages if you must. The problem is that R programmers often don't know how to write good code in any language.

Same issue as Excel, really. Easy to use, so you get a lot of users with very thin engineering skills.

The solution is for production engineers to understand just enough R to set standards for data scientist code that enable reliable translation of the models to the production language. As with JS, you can complain about the yucky parts, or you can accept that it's the best tool for some jobs and make an effort to work around the yucky parts, or use the tools of those who are doing that (e.g. tidyverse and Wickham).

If you want data scientists to produce production-ready results, you have to hold them to the standards of production engineering.

1 comments

"Same issue as Excel, really. Easy to use, so you get a lot of users with very thin engineering skills."

Huh?

While I totally agree with your quote, I'd think it applied a lot more to python than to R. Especially given that python seems to be the dominant "first language for people to learn when they get into programming" because it is "easy".

The proportion in R is higher because the community of software engineers working in R is a lot smaller. R coders are overwhelmingly data analysts, while Python coders have more diverse roles. People who use R are also much more likely to have learned R, and only R, from their university courses towards a data science-related degree, especially if that degree is in statistics.
R is a language people use when they get into statistics, not even thinking specifically of programming.