Hacker News new | ask | show | jobs
by larrydag 3024 days ago
I'm a statistician by trade so I mostly do prototype work. As far as building models my key workflow is using a R and Rstudio. The biggest issue is data management. I suggest a good API or wrapper for a data source that has all of the ETL already done for the most part. R connects very well to most database systems. RStudio makes development easier with connectivity to GitHub or other popular version control systems.

As far as putting into production I'm not as familiar. Yet I hear that a good Python workflow would probably work best.

1 comments

Also Rstudio mates well with bitbucket for those who want private repos for free.
For private repos, I'd say gitlab is an order of magnitude (or two) better than bitbucket. Or, it clearly was 2 years ago, and while I haven't kept up with bitbucket, gitlab has improved by leaps and bounds in those two years.

The killer features for me are nested subgroups (which bitbucket may have, but github does not) and a really awesome CI system with a generous free tier (2000 minutes/month). For R packages, we have it setup very similar to github + travis (devtools::check() every push), and for deployable bits we have it build containers and run integration tests on them. Super impressed with all we get for free there.

Great point.

Also per the production environment. The key is having the same machine learning libraries available in both the development and production so you can plug the model in with little problems with dependencies. For this reason most folks that will go into production, particularly web applications, will tend to both develop and implement in Python, Java, etc.