| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by wenc 3239 days ago

There is one job that is still difficult for a machine to do well (although machines are improving): feature engineering.

ML works very well in bounded/closed domains like image and sound recognition. Open-domains are much more challenging.

Building predictive models from data in specialized domains often require insight, which machines cannot provide. For instance, let's say you collect a bunch of data and are trying to predict sales. You need to apply domain knowledge, experience and intuition to know what variables are causal or correlative. If you just throw all the variables into the mix and build a model from that, you will end up with a model that overfits badly.

There are automated "variable selection" techniques that can help to prevent overfit, but they are mostly imperfect because machines can only detect correlation and not causation. Also, many regression/classification techniques are easily fooled by noise and highly nonlinear relationships. We did some work a few years ago comparing predictive models built from a ton of sensor data (with automated variable selection) vs. one that was parsimonious that was built on select data that we knew accounted for 80% of the effect. The latter model was far superior. Noise/non-causal variables often don't just "wash out" even with very good variable selection algorithms.

It takes domain knowledge to figure out what variables matter and what variables don't.

1 comments

pakl 3239 days ago

What was the architecture of your predictive model? Was it designed to learn the underlying physical dynamics from the tons of sensor data?

link

wenc 3239 days ago

It was a hybrid of several algorithms. Yes, it was an adaptive model trained with a large historical dataset and updated daily.

link

freddealmeida 3239 days ago

i think not. any publications you could point to?

link

wenc 3239 days ago

I'm not sure what the "I think not" was in response to. This was an industrial application, so there were no publications.

link