|
|
|
|
|
by tumanian
2310 days ago
|
|
I run a team of data engineers, and over the years there has been a lot of confusion between what is a data scientist and what is a data engineer. I draw the divide in that data scientists discover the features and the methodology, while data engineers take these insights to production. One can argue that data scientists themselves could do that, but this is constrained by the domain expertise on tools(be that the depth of spark internals or whatever) and the number of hours in the day. It's hard enough to deal with the variance of the models to deal with the variance of the system. A good data engineer is a unicorn.I define three central competencies for a data engineer:
be a good coder: quality, maintainability, efficiency,
know how to explore the data: SQL, R, just eye the damn data feed,
know enough data science to interface with scientists For a data engineer it's okay not to know probability theory and stats that much, but its a must for a data scientist( running TensorFlow out of the box with no understanding of the underlying math doesn't make a data scientist, just a common butcher). |
|