| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by litzer 3236 days ago
	As somebody who's recently starting to learn more about ML, a lot of the work of an ML engineer does seem to be automate-able (not doing research or pushing boundaries but just applying ML to some product need). For example, choosing hyperparameters, evaluating which features to collect, etc seem to be things that can be automated with very little human input. His slide on "learning to learn" has a goal of removing the ML expert in the equation. Can somebody who's more of an expert in the field comment on how plausible it is? Specifically, in the near future, will we only need ML people who do research, due to the application being so trivial to do once automated?

2 comments

wenc 3235 days ago

There is one job that is still difficult for a machine to do well (although machines are improving): feature engineering.

ML works very well in bounded/closed domains like image and sound recognition. Open-domains are much more challenging.

Building predictive models from data in specialized domains often require insight, which machines cannot provide. For instance, let's say you collect a bunch of data and are trying to predict sales. You need to apply domain knowledge, experience and intuition to know what variables are causal or correlative. If you just throw all the variables into the mix and build a model from that, you will end up with a model that overfits badly.

There are automated "variable selection" techniques that can help to prevent overfit, but they are mostly imperfect because machines can only detect correlation and not causation. Also, many regression/classification techniques are easily fooled by noise and highly nonlinear relationships. We did some work a few years ago comparing predictive models built from a ton of sensor data (with automated variable selection) vs. one that was parsimonious that was built on select data that we knew accounted for 80% of the effect. The latter model was far superior. Noise/non-causal variables often don't just "wash out" even with very good variable selection algorithms.

It takes domain knowledge to figure out what variables matter and what variables don't.

link

pakl 3235 days ago

What was the architecture of your predictive model? Was it designed to learn the underlying physical dynamics from the tons of sensor data?

link

wenc 3235 days ago

It was a hybrid of several algorithms. Yes, it was an adaptive model trained with a large historical dataset and updated daily.

link

freddealmeida 3235 days ago

i think not. any publications you could point to?

link

wenc 3235 days ago

I'm not sure what the "I think not" was in response to. This was an industrial application, so there were no publications.

link

halflings 3236 days ago

You will still need data engineers to build the whole data ingestion and processing pipeline (although that can be easy if standardised tools are available, such as spark, it's still a challenge in many cases).

link

litzer 3236 days ago

Right, but I'd consider that falling closer to the realm of general software engineering -- similar to tasks of collecting analytics of users or building infrastructure to get data from point A to point B.

Maybe that currently is some parts of the job of an ML engineer. But if that's the only part, I don't think that role should be called one of ML engineer anymore

link

davedx 3236 days ago

I am working on solving this problem at the moment - I'm building a product that lets anyone build the ETL pipelines that produce inputs for a ML model. If anyone's interested in beta access (coming month or two) let me know, davedx@gmail.com

link