Hacker News new | ask | show | jobs
by oli5679 2418 days ago
I'd recommend exporting R model as PMML file, and getting your Java team to interact with Openscoring server.

PMML is language agnostic model specification (XML like). Python and R machine learning ecosystem can easily generate these (caveat, only tried for gbdt and linear models and not sure this works well for neural nets).

Openscoring is Java library that creates rest API for scoring models. It's lightweight, battle-tested, nice API, good model versioning and in my experience 10x faster than Python flask. You don't need to write any Java code, just download and run the .jar and post valid PMML to the right endpoint.

Another feasible approach is Sagemaker deploy - code from Jupyter notebook can deploy API in one line. I think this can be less economical and have higher latency if you will have high usage but a datascientist can do model updates from within a notebook.

Please NEVER hardcode regression model coefficients within Java. This is a nightmare to maintain, prevents increasing model complexity and is no simpler than PMML + openscoring. I think you can wrap the Java PMML library in another Java web framework like spring if you need something more bespoke.

https://www.rdocumentation.org/packages/pmml/versions/2.1.0/...

https://github.com/openscoring/openscoring

https://aws.amazon.com/blogs/machine-learning/using-r-with-a...

1 comments

Looks very interesting. Will definitely explore in this direction.

> Please NEVER hardcode regression model coefficients within Java.

Amen to that.

If you aren't wedded to R then pickling sklearn Pipeline and loading in Flask app can be nice. Advantage of this is that data pre-processing can also be included in a sklearn Pipeline.

https://scikit-learn.org/stable/modules/generated/sklearn.pi...

The bit I'm not sure about how to do well is model monitoring.