|
|
|
|
|
by georgeburdell
2620 days ago
|
|
What does "model" mean? What kind of data are contained in a machine learning model? Second, how do you decide a model is robust? I'm asking because I'm looking at using ML to more efficiently use some quality assurance tools for a product line. The idea is to develop a model such that product A, B, C can have existing (or supplementary data) QA data plugged into a model, and then an appropriate sampling plan can be output. An intern showed proof of concept of such a model based on one product, and it's fantastic work that could save thousands of dollars, but we're struggling with how to "qualify" it. How do we know we won't get a "garbage in/garbage out" situation? |
|
A model is two things: a description of what's in the black box (could be a linear model, a neural network architecture, etc) and some weights which uniquely define "that specific model". Each model will have some known input (eg image, tabular data) and output (eg number, image, list etc).
You need to store both the structure and weights: for example your model is y = mx + c, but you need to know m and c to uniquely define it.
To answer your second question robustness means a smart test strategy. Train on representative data, validate during training on a second data set and test on hold-out data that the model has never seen.
Unfortunately it's quite hard to prove model robustness (in the case of deep learning anyway), you have to assume that you trained on enough realistic data.
If you really have no idea about robustness, then you should probably do a kind of soft-launch. Run your model in production alongside what you currently use, and see whether the output makes sense.
You could try, for example, sampling with your current strategy as well as the schedule defined by your ML model (so you lose nothing but a bit of time if the ML system is crap). Then compare the two datasets and see whether the ML model is at least performing the same as your current method.
Surely you can make some naive estimates of robustness though? eg if the model says sample 5% of your product, you then have a bound on the chance that you miss something.