Hacker News new | ask | show | jobs
by baldfat 3556 days ago
> It seems that once you figure out a good model in R, its almost always rewritten into either Scala or Java for real production work.

I wouldn't say 1% of programs in R written need that speed. I personally use it for small projects (Besides a few Spark side projects) and I am out putting Reports.

I really would like someone to show an actual example of this happening in 2016.

1 comments

I do it at my company. I prototype in R, and then end up having to rewrite chunks of it in Python so it can be worked into our application, which right now is exclusively Python.

It's not a matter of performance, it's just because it would be an enormous amount of engineering overhead to start calling R from inside the Python app

Check out opencpu.org, it's an R web api. Really cool stuff.
That seems like you could simply use http://jupyter.org/ and just run the script with R code inline.

http://blog.revolutionanalytics.com/2016/01/pipelining-r-pyt...

Also why not just switch to Pandas it really is a pretty close R clone.

It has nothing to do with interoperability on my machine. I use notebooks (and Pandas) all the time, and I consider myself fluent in bith R and Python.

It's because R is a substantial engineering dependency. As I said, our entire stack is Python and Node. Yes, you can call R from Python using Rpy2, but that's a pro-bono project maintained largely by one person. It's great for casual use, but there is far too much risk to start talking about building critical business code around it.

So why not Pandas?
Personal preference. I switch back-and-forth based on the project.

R data frames are native and feel native. Pandas data frames are non-native and can be a pain in the ass to work with.

That, and there is a lot mpre to the decision than just which data frame implementation I like better.

"Pretty close" as long as you stay within the region of common functionality. I wouldn't say it's a clone.
That is true. I actually started my journey with Pandas and then switched to R for the ecco-system and zero based for data science drove me nuts.

But I do feel that the goal is a clone.

"Python has long been great for data munging and preparation, but less so for data analysis and modeling. pandas helps fill this gap, enabling you to carry out your entire data analysis workflow in Python without having to switch to a more domain specific language like R." http://pandas.pydata.org/

How much experience do you have in statistical computing, out of curiosity?