Hacker News new | ask | show | jobs
Ask HN: Steps from Analyst to Data Engineering?
6 points by quokkafriend 1628 days ago
For someone who has strong SQL skills and some scripting experience, what would be the best approach to shifting into data engineering?

A constraint is that current place of employment does not offer transfer or mentorship opportunities.

If you were to recommend the top 2-3 actions to take to enter the field and gain employment, what would they be? (self-study resources, courses, projects, bootcamps)

3 comments

I took a similar route, albeit much less intentionally and spanning almost a decade. Software QA -> Business Intelligence Analyst -> Data Scientist -> Data Engineer.

Here what I'd recommend today:

1. get very comfortable with Python. Scripting isn't enough, you'll need good OO principles, understand how to manage projects/libraries/dependencies, etc. This will take the longest, so start it first.

2. Read and re-read Designing Data-Intensive Applications by Kleppmann. This is the bible of data engineering and far outclasses anything else currently available.

3. Get your hands dirty with modern tools and the whole data lifecycle. DBT, Airflow, Snowflake, Postgres should be obvious (feel free to substitute prefect, clickhouse, etc. if desired). You'll also want familiarity with a cloud stack and how to manage it (terraform, pulumi, or CDK). A public portfolio project would be great, but being able to talk confidently about the how and why of these things is probably enough.

The hard part is getting that next job. Look for junior roles at big companies, and mid-level roles at startups who don't understand the data ecosystem yet (almost any startup whose product is not ML or ELT). The former will give more mentorship, the latter will be easier to get if you can talk the talk in an interview.

>you'll need good OO principles

Could you please give more details why this is important? I have good experience with dealing with data, data science and little bit of data engineer too but I never saw the necessity for OO. I'm also very interested in data engineering and was wandering why you mentioned OO and why it is important for data engineering?

Thank you.

I've been there too, especially since most of my pre-eng work was in R. In python at least, if you want to write code that others can use and extend, you need to embrace objects. Nothing fancy necessarily, but you should have a sense of how to organize a class hierarchy for a given program, when and how much inheritance to use. To do that you'll touch on a lot of small things like mixins, MRO, ABCs, etc.

One way you may be forced into this is custom Airflow operators; the community now recommendeds writing ~0 logic in airflow and sticking it all in docker instead, but any team using airflow for more than a few years has a tangled web of custom bullshit you'll be expected to maintain and extend.

You can certainly write a lot of python in a more procedural and/or functional way, but if you ask a python engineer to use or modify that code, don't be surprised by their anger.

Thank you so much for your answer!

I have general understanding of OO in Python. I just do not see where exactly to use it in data engineering. Could you please recommend any book/article/video that shows with the examples when to use OO in data engineering tasks?

This is perfect thanks.

On #3, do you know of any public githubs or codebases that are good to reverse engineer and learn an end to end pipeline from?

Always like to supplement book learning with real world production-level examples, especially in this case where I don't have access to that where I work.

Are you doing any data engineering at you current job? Anything you can lean into more? My first data analyst job was essentially a mistitled data engineering job.

One challenge that you need to overcome is that many data engineering roles require experience with specific tools and that junior positions aren't quite as common.

I also think that you should consider backend software development as an option. It's a similar role (depending on the type of DE role you compare to) but with better opportunities for progression and more career flexibility; it's also been around much longer than DE which is a relatively new job title. Another weird thing is that some DE roles have lower status in some companies. Especially if you get to work with data scientists, there's sometimes the perception that DS are the smart people and DE are the grunts that do the plumbing. The whole thing is obviously very misplaced but it does exist in some places and you should be aware of it.

I see thank you. A lot to ponder here.

I'm actually a PM with a good analytics background that is contemplating my options.

I was considering DE because it seemed maybe more of a feasible leap than straight to backend SWE.

Thank you. A lot to ponder.

Im actually a PM with analytics experience so thought that DE would be an easier path in than backend SWE...

If you already have experience, just apply.