|
|
|
|
|
by jeremystan
3778 days ago
|
|
We use R in production in two ways: 1. For batch processes that run daily, hourly or minutely, where the models are rebuilt on every run, and outputs (often predictions) are written to a database
2. For computation of coefficients in large sparse regularized models, where the coefficients are written to a database and scoring is done in another language in real-time For situations where we want real-time predictions, recommendations or optimizations, we tend to setup Python services instead. For batch processes, you can definitely store models in S3 to re-use them, and I've done that at other companies. But in general I've found it better to rebuild models frequently and cache them for short periods of time only if they are cost-prohibitive to rebuild. |
|
Also about scoring in another language - is this really worthwhile for you ? I have often debated just throwing 128GB of RAM on an R machine and calling it a day. As I figure, your "real time" requirements are probably seconds or even minutes (similar to mine).