Hacker News new | ask | show | jobs
by arithmomachist 1901 days ago
What is the difference between data engineering and data science? The terms frequently seem to be used interchangeably, but apparently they're not synonymous.
2 comments

They aren't synonyms but they sometimes overlap.

- data engineering involves more work on data transformation and developing different pipelines

- data engineering requires more knowledge of databases, cloud environments or different streaming tools (it gets close to being a backend developer in some places)

- data engineering doesn't involve any statistical modeling, data science does

- data science is a broader term - depending on the company a data scientist might be doing all the data engineering work (if it isn't too much) + the model work and statistics. Or they might be focused entirely on research, statistics and ML models

Thanks, that clarified it.

Do you know how people typically get into that role?

For us, depends on the seniority of the role, but we've had good luck bringing in people coming from both directions (where I define the "directions" as "software engineering" and "data science/analysts")

Analysts and junior data-science types can often make the transition well if they can beef up their engineering skills (i.e. learn to write tests, make stuff that will be maintained for years)

Software engineers are often a good fit too if they can pick up some of the data skills (get really good at SQL).

Probably really depends a lot on the specifics of the position, sometimes "data engineer" means "write sql queries to apply business rules" and sometimes it means "maintain our interesting in-house ETL applications which were written in Java 8 years ago"

I'd also value the soft skills a lot if I were to be hiring data engineers - so much of the job tends (at least where I am) to be correctly interpreting business rules/needs and anticipating potential future use-cases.

Hmm, so it might be out of reach for me. I have a PhD in pure math, and no experience as a software engineer. I've coded for research, but never for production.
You can move over from being a software engineer to a data engineer pretty easily. Or you can be a data scientist who had some exposure to that kind of work and move over to data engineering quite easily too.
I think you're right, they're pretty wishy-washy, but I'd define data engineer as someone who builds systems that make quilty, useable data available (i.e. anything from building ETL pipelines to productionizing models), vs a "data scientist" which I'd probably describe as doing more one-off in-house research type work.

I suspect a lot of "data scientists" end up being "people who write tableau reports for other people" and/or "people who manage an ugly pile of python data processing scripts to make the data-spice flow"

In my experience the plumbing is a lot more work [requires more man-hours] than the interesting visualizations, and I think some organizations do a good job of supporting a few scientists with a robust engineering staff, while others hire the scientists because they want the fruit, but forgot to plant the fruit tree.