Hacker News new | ask | show | jobs
by winchester6788 2130 days ago
> In fact, most data science and ML engineers are quite skilled in systems engineering, because you have to do so much work with GPU hardware issues, underlying scientific package management, efficient data transportation, etc.

Unless your data scientists are expected to build the machines they use, they won't be dealing with any hardware issues at all.

Literally every data scientist at big companies use pre-configured vms/notebooks in cloud.

1 comments

This is just deeply wrong. Most data scientists hate Jupyter notebooks and deeply recognize the flaws of the paradigm, poor modularity or testability, etc.

As an ML engineer you spend a lot of your time dealing with Cuda installations, custom compiler flags and then build/compilations of things like Tensorflow, deep internals of Docker image builds to make these environments reproducible, image processing software with opencv and tons of cross platform & software packaging headaches, writing efficient queries and understanding data structure implications for spark, arrow, hdfs, presto, postgres, etc etc, and standing up things like tensorboard for telemetry of ML training systems, deploying mlflow or kubeflow in kubernetes, and so on.

The myth of data scientists as notebook jockeys is just one more symptom of the denial of SRE orgs to admit ML engineers are great system engineers, to try to control them with parochial devops requirements coming from outside specializations.

> The myth of data scientists as notebook jockeys is just one more symptom of the denial of SRE orgs to admit ML engineers are great system engineers

It's most obvious with this statement, but overall you seem to think "ML engineer == data scientist", which just isn't the case.

That sounds like a No True Scotsman fallacy to me. You’re trying to define “data scientist” as someone who only knows how to use notebooks, fails to put testing as a first class consideration, etc., but that’s a severe minority of people with the job title of Data Scientist.

I manage teams of both ML engineers and Data Scientists and have designed hiring processes for both within large ecommerce companies for years.

> You’re trying to define “data scientist” as someone who only knows how to use notebooks

I'm not the other person, and that is definitely not what I'm saying. There can be overlap, but the distinction is important: Data scientists use tools for analysis, ML engineers are capable of building those tools.

For example, the tool I'm aware of that some of our data scientists use is SPSS - but they have no programming experience, and could not remotely be grouped in with "ML engineers".

I understand, I’m just saying that the “SPSS only” type of data scientist (glorified business analyst) you describe is very rare in industry and it’s not a useful broad brush to paint the field of data science with - it’s a greatly exaggerated and overrepresented stereotype.