| The one I’m working with _now_ is very low tech: daily Python processing data from GCP, and writing back to GCP; a handful of scripts that check everything is reasonnable. That’s because we serve internal results, mostly read by humans. The most impressive description that I’ve seen run live is described here:
https://towardsdatascience.com/rendezvous-architecture-for-d... I’d love to have feedback from more than Jan because I’m planning on encouraging it internally. The best structure that I’ve seen is at scale (at a top 10 company) was: - a service that hosted all models, written in Python or R, stored as Java objects (built with a public H2O library); - Data scientists could upload models by drag-and-drop on a bare-bones internal page; - each model was versioned (data and training not separate) by name, using a basic folders/namespace/default version increment approach; - all models were run using Kubernetes containers; each model used a standard API call to serve individual inferences; - models could use other models output as input, taking the parent-model inputs as their own in the API; - to simplify that, most models were encouraged to use a session id or user id as single entry, and most inputs were gathered from a all-encompasing live storage, connected to that model-serving structure; - every model had extensive monitoring for distribution of input (especially missing), output, delay to respond to make sure both matched expectation from the trained model; e.g.: if you are training a fraud model, and more than 10% of output in the last hour was positive, warn the DS to check and consider calling security; e.g.a.: if more than 5% of “previous page looked at” are empty, there’s probably a pipeline issue; - there were some issues with feature engineering: I feel like the solution chosen was suboptimal because it created two data pipelines, one for live and one for storage/training. For that problem, I’d recommend that architecture instead: https://www.datasciencefestival.com/video/dsf-day-4-trainlin... |
An intern showed proof of concept of such a model based on one product, and it's fantastic work that could save thousands of dollars, but we're struggling with how to "qualify" it. How do we know we won't get a "garbage in/garbage out" situation?