|
|
|
|
|
by currymj
1901 days ago
|
|
i think this actually gets at what makes applied ML distinct from statistics as a practice, even though there is a ton of overlap. statisticians make assumptions 1 and 2, and think of themselves as trying to find the "correct" parameters of their model. people doing applied ML typically assume they don't know 1 (although they might implicitly make some weak assumptions like sub-gaussian to avoid fat tails, etc.) and also typically don't care about being able to do 2. and they don't care about their parameters; in a sense to an ML practitioner, every parameter is a nuisance parameter. instead you assume you have some reliable way of evaluating performance on the task you care about -- usually measuring performance on an unseen test set. as long as this is actually reliable, then things are fine. but you are right that in the face of a shifting distribution or an adversary crafting bad inputs, ML models can break down -- but there is actually a lot of research on ways to deal with this, which will hopefully reach industry sooner rather than later. |
|
This is the part that often fails in practice. Think of all the benchmarks that show superhuman performance and compare that to how good those same models really aren't. Constructing a good set of holdouts to evaluate on is really hard and gets back to similar issues. In practice, doing what you're describing reliably (in a way that actually implies you should have confidence in your model once you roll it out) is rarely as simple as holding out some random bit of your dataset out and checking performance on it.
On the other hand, what you often see is people just holding out a random bunch of rows.