| Yep, you and the user you're replying to are both right in different ways. One thing's for sure - machines don't generate "insights" on their own. Let's define an "insight" as "new meaningful knowledge", just for fun. We could talk about what comprises "new" and "meaningful" but it would be beside the point I'm making. In a supervised learning problem, the range of possible outputs is already known, meaning the model output will never be categorically different from what was in the training data. The knowledge obtained is meaningful as long as the training labels are meaningful, but it can never be new. Unsupervised learning doesn't have a notion of "training data" but that means an unsupervised model's output requires additional interpretation in order to be meaningful. It is possible to uncover new structures and identify anomalies in new ways, but this knowledge isn't meaningful until someone comes in and interprets it. Applied to the specific example where sensor data is used to try to generate insights about machine functionality: Either you can only predict the types of failures you've already seen, or you can identify states you've never seen but you wouldn't know whether they mean the system is likely to fail soon or not. It's the Roth/401(k) tradeoff. For model output to be useful, someone must pay an interpretation tax. The only choice is whether it is paid upon insight deposit or withdrawal. |
Yup, this is something I've seen from both sides. First you mention is basically the standard, while the last is part of the deep learning voodoo black magic that executives and sales love.
I've had people approach me with proposals like "What if we just churn [ALL OF] our data through this or that model, and let's see if it comes up with some patterns we've never seen or thought about"
And that's not just for industrial applications. It's everywhere.
What is concerning to me is that this mentality will surely induce more unrealistic expectations. Before you know it, business execs are starting to ask why we need business analysts at all, because surely those fancy deep neural networks can extract all kinds of features - "only need data scientists to figure out those things".
So yeah, that's my fear. That businesses will blindly start to discard domain knowledge, and just feed black-box models their data, and let the data scientists wrestle with the results.