Hacker News new | ask | show | jobs
by IMTDb 1807 days ago
What would be the name of the position/profile of someone in charge of building the data warehousing architecture/ETL pipelines?

I my view, they need make sure the warehouse model is a correct representation of the business and that it can be leveraged to answer basic or not-so-basic questions using SQL. They also need to promote it's usage internally by ensuring it is accessible and easy to use and guide other team to a more data oriented mindset.

I feel that this is a specialised position not exactly similar to a developer, but every time I look for "data scientist" I get guys that want to do machine learning prediction models, which is not exactly the same stuff either.

11 comments

I would also vote for "data engineer" (it's my current job title).

You very likely don't want a data scientist to be doing a data engineer's job (and they probably don't want to be doing it themselves!). While there are similarities, data engineering tends to be a lot closer to software development than data science. If you're advertising for a data scientist role, don't expect them to be happy if 80% of their job is writing ETL scripts and cleaning datasets.

I think the reason there has been a flattening in data scientist job growth more recently is that lots of companies hired data scientists to build cool ML applications but had no infrastructure in place to support advanced data analysis. These companies didn't realize they needed to walk before they could run, and that what they really wanted was data analysts and engineers to build the foundation for a strong data science function.

Tools like dbt have been great for advancing an ELT approach to managing data pipelines, where modeling for BI tools, business users, and data scientists alike can all happen in the warehouse and ensure consistency in data usage across the company.

The one issue is that the gamut of experience and ability in a data engineer (and the salaries) is extremely wide, far wider than I’ve seen for any other role. Hiring a good DE is so hard!
Seconded.

I was a bit sad to not see any mention of a data engineer anywhere in the article.

Like, if you gave me access to all the prod tables and the warehouse I'd be having a whale of a time and (hopefully) delivering enough business value to automate some of the more regular "English to SQL" translations.

> You very likely don't want a data scientist to be doing a data engineer's job.

100%. This is one of those things that would make "disgruntled ML people" in the article want to leave.

This is spot on. As someone who has been looking for a data analyst role, I’ve actually read quite a few DS reqs that were geared more towards infrastructure and ETL. Then the flip side with the DE reqs wanting NumPy and Pandas along with the infrastructure and ETL. Weird, right?
IMO data engineer roles are further subset into:

1. kafka / streaming oriented software engineering

2. data warehouse and ETL/ELT development for analytics

A good data engineer understands and can work with both of these.

They're both "data in, data out" mental models that are part of the Lambda architecture which every data engineer should at least know about [0].

But if you want a specialist streaming person to optimise all the streaming pipelines, then sure hire a specialist.

[0]: https://en.m.wikipedia.org/wiki/Lambda_architecture

A new role has arisen in the last few years that captures much of this responsibility - Analytics Engineer.

This article by Claire Carroll describes the role and motivation for it https://www.getdbt.com/what-is-analytics-engineering/

I currently do that job as a Data Architect - kind of a mouthful lol but it covers the gamut of understanding the entire business as an abstract set of data flows, being responsible for the ingest and outflows of data, the level of quality in our overarching system, managing data engineers, developers, business folks all accessing said data, at the end of the day explaining what it all means to our clients and devs via standard modeling stuff and more targeted things as needed.
You mention that you manage data engineers. Where does your role not overlap w/ a data eng?
In our team its mostly a difference of business focus and the overarching responsibility - most data engineers I work with manage a major leg of the business and are responsible for their domain but I am responsible for all of them.

I certainly spend time coding (especially because again, small-medium startups cant afford anyone in the data space who isnt able to heave ho) but much of it is translating pretty vague stuff into market research/a proof of concept/an initial design of what will bring value to the business and scale alright and then often more people will throw in.

That being said you can call me whatever you want, as long as its not late for dinner :)

What about "data engineer"? There seem to be a lot of jobs for that title nowadays.
Yeah we would call this Data Engineer (likely Senior level or up for someone that has had experience building multiple data warehouses) plus the DevOps/SRE work required to stitch all the architecture together
You're mixing up two different tasks as I see it:

* Building/defining the data infrastructure

* Building/defining the schemas

In a traditional ETL infrastructure they are jumbled together but if you do ELT they are not. A data engineer can build the infrastructure but the transformations can be handled better by technical analysts. They're simply one view on the underlying data so the risk is minimal. Analysts query the data day in and day out so they know much better what they need than someone who doesn't.

The bigger issue is adaptability.. can you migrate schemas preserving older clients, typically that’s by providing a decent middleware…. SQL views are one way, APIs are another etc…

All of that while improving performance.

Analytics Engineer is a clear one for this, as teej said.

The title is strongly associated with the dbt community, so it could imply you’re using dbt for your data modeling (not necessarily a bad thing, as it sounds like it would be a good tool for your use case).

I’ve done this for the past 6 years and my title was “Big Data Infrastructure Engineer” but I don’t think there’s any consistency at companies from what I’ve seen
This is what data engineers do, although that is also used to describe data ops (maintaining clusters, running kafka, etc.)
You pretty much described my job in a nutshell, and they call me "the database guy".
Most common would be a DevOps or SRE on an observability team.