| So you want to figure out how often you need to sample products for QA? A model is two things: a description of what's in the black box (could be a linear model, a neural network architecture, etc) and some weights which uniquely define "that specific model". Each model will have some known input (eg image, tabular data) and output (eg number, image, list etc). You need to store both the structure and weights: for example your model is y = mx + c, but you need to know m and c to uniquely define it. To answer your second question robustness means a smart test strategy. Train on representative data, validate during training on a second data set and test on hold-out data that the model has never seen. Unfortunately it's quite hard to prove model robustness (in the case of deep learning anyway), you have to assume that you trained on enough realistic data. If you really have no idea about robustness, then you should probably do a kind of soft-launch. Run your model in production alongside what you currently use, and see whether the output makes sense. You could try, for example, sampling with your current strategy as well as the schedule defined by your ML model (so you lose nothing but a bit of time if the ML system is crap). Then compare the two datasets and see whether the ML model is at least performing the same as your current method. Surely you can make some naive estimates of robustness though? eg if the model says sample 5% of your product, you then have a bound on the chance that you miss something. |