|
|
|
|
|
by sandGorgon
3777 days ago
|
|
How do you guys run R in production? Just getting started with R based datascience and it has been a struggle to figure out how to build a production data science stack. Do you snapshot the computed models as RData and stream them to s3, etc |
|
1. For batch processes that run daily, hourly or minutely, where the models are rebuilt on every run, and outputs (often predictions) are written to a database 2. For computation of coefficients in large sparse regularized models, where the coefficients are written to a database and scoring is done in another language in real-time
For situations where we want real-time predictions, recommendations or optimizations, we tend to setup Python services instead. For batch processes, you can definitely store models in S3 to re-use them, and I've done that at other companies. But in general I've found it better to rebuild models frequently and cache them for short periods of time only if they are cost-prohibitive to rebuild.