Hacker News new | ask | show | jobs
by georgeburdell 2620 days ago
What does "model" mean? What kind of data are contained in a machine learning model? Second, how do you decide a model is robust? I'm asking because I'm looking at using ML to more efficiently use some quality assurance tools for a product line. The idea is to develop a model such that product A, B, C can have existing (or supplementary data) QA data plugged into a model, and then an appropriate sampling plan can be output.

An intern showed proof of concept of such a model based on one product, and it's fantastic work that could save thousands of dollars, but we're struggling with how to "qualify" it. How do we know we won't get a "garbage in/garbage out" situation?

2 comments

So you want to figure out how often you need to sample products for QA?

A model is two things: a description of what's in the black box (could be a linear model, a neural network architecture, etc) and some weights which uniquely define "that specific model". Each model will have some known input (eg image, tabular data) and output (eg number, image, list etc).

You need to store both the structure and weights: for example your model is y = mx + c, but you need to know m and c to uniquely define it.

To answer your second question robustness means a smart test strategy. Train on representative data, validate during training on a second data set and test on hold-out data that the model has never seen.

Unfortunately it's quite hard to prove model robustness (in the case of deep learning anyway), you have to assume that you trained on enough realistic data.

If you really have no idea about robustness, then you should probably do a kind of soft-launch. Run your model in production alongside what you currently use, and see whether the output makes sense.

You could try, for example, sampling with your current strategy as well as the schedule defined by your ML model (so you lose nothing but a bit of time if the ML system is crap). Then compare the two datasets and see whether the ML model is at least performing the same as your current method.

Surely you can make some naive estimates of robustness though? eg if the model says sample 5% of your product, you then have a bound on the chance that you miss something.

1. What I’m working on at the moment is AB-testing, so no real models there; plenty of simulations and tests though.

2. There are several videos of Jan describing his work, including that one, so I’ll let him give examples of what he means by models: https://www.datasciencefestival.com/video/dsf-day-2-jan-teic...

3. At the big company, it’s an e-commerce website with many products along many dimensions, so models about what aspect of the product customers would be interested in, whether they are likely to commit to purchasing now or just browsing; price sensitivity against other factors. They typically have non-authenticated users, so they have to guess a lot about the users, from time of day, country of connection, type of device used, browsing rhythm — the inferences are not perfect, but they inform how the product is presented, and have a meaningful impact on conversion.

4. In the presentation at Trainline, there are not explicit about what models they have in mind, but it’s also an e-commerce company, so a lot of similar decision.

One unique problem they had talked about openly before (UK train companies are not really reactive but British people love their festivals, championship matches, protests, horse- and dog-races and drinking during all of the above): they deal with the occasional crowded train, so they are trying to predict if a train is going to swamped and if the person booking is going to the event in question. In the latter case, they’d rather avoid the loud fans or drunken top-hatted horse-owners.

For all of the above models, the models are trying to predict something that they can have ground-truth about (typically: buying behaviour), often based on data obtained minutes later. That means all are monitoring the model accuracy, typically off-line. In most cases, they are also monitoring the impact of the use of the model: better recommendations should lead to better conversion, but also, say, a higher MRR.