Hacker News new | ask | show | jobs
by tumanian 2310 days ago
I run a team of data engineers, and over the years there has been a lot of confusion between what is a data scientist and what is a data engineer.

I draw the divide in that data scientists discover the features and the methodology, while data engineers take these insights to production. One can argue that data scientists themselves could do that, but this is constrained by the domain expertise on tools(be that the depth of spark internals or whatever) and the number of hours in the day. It's hard enough to deal with the variance of the models to deal with the variance of the system.

A good data engineer is a unicorn.I define three central competencies for a data engineer: be a good coder: quality, maintainability, efficiency, know how to explore the data: SQL, R, just eye the damn data feed, know enough data science to interface with scientists

For a data engineer it's okay not to know probability theory and stats that much, but its a must for a data scientist( running TensorFlow out of the box with no understanding of the underlying math doesn't make a data scientist, just a common butcher).

1 comments

I've seen the role you're describing (taking insights to production) move to be described as a "Machine Learning Engineer", whereas Data Engineering is closer to the front end of the process, productionising the _data_ gathering and organisation. I really liked this diagram, it matches well with how I've seen roles advertised lately https://twitter.com/workera_/status/1215081851577962497