Hacker News new | ask | show | jobs
by ic_fly2 733 days ago
With all the data issues strong quality and normalisation I often get the impression that enabling more people with non CS backgrounds to do this work is not necessarily a good thing.

In other words, if writing python and sql is the skill requirement that stops you from making an etl pipeline, maybe do something else.

5 comments

While I’m not usually on board with gatekeeping, this field already struggles with a huge amount of very non-technical folks and their respective managers, producing overall mediocre results and giving the profession a bad rep, to the point where I now avoid the Data Engineer title and just call it “SWE specialized in large data processing” or something equally as fluffy.

For me it’s more accurate, too. At $work, there’s no difference to how an SWE vs a “DE” works. Same interview process too, DSA, distributed systems etc.

However, having done this for more than a decade, that is relatively rare. It’s usually a mix of GUI tools with zero reproducibility / infra-as-code, untyped python, copy pasted shell scripts, zero tests, zero ci/cd, no lifting/static analysis/code reviews etc., paired with generally zero understanding of the underlying tech. It’s all very formulaic with little to no actual understanding.

I will spare you my usual rant on why a language without a solid type system like python is a horrible idea for this field, too.

Which is why I much appreciate dbt. While some people scoff at the idea of “SQL with jinja templating”, their approach has certainly helped to move DE closer to SWE work, purely by virtue of their value prop mostly being exactly that. And it works out great.

So if Bob from accounts needs a new report generating, he has to wait for 6 month for an IT guy to do it? Who probably won't do a very good job, because he doesn't understand what Bob needs as well as Bob does? Bob is going to hack something horrific together in Excel instead. Better surely to let Bob have a GUI point and click tool more appropriate to the job?
On the other hand, Bob keeps asking for a self-serve reporting tool but in my experience, he doesn't actually want to use it. We went the route of putting all the data into a lake and hooking up GUI reporting tools to it and what did we get? Bob doesn't understand this or that column, Bob made a report that is fundamentally flawed, Bob sent a request for the engineers to make him a report using the tool and so on. Bob wanted something, or someone else, to do the work for him. When it became apparent that the tool isn't magic and can neither read minds nor divinate the true meaning of data in the DB, it became the engineer's problem again. So why not let engineers use the tools they prefer?
>On the other hand, Bob keeps asking for a self-serve reporting tool but in my experience, he doesn't actually want to use it.

Bob can't have it both ways. ;0)

>So why not let engineers use the tools they prefer?

Empowering end-users shouldn't mean forcing engineers to use the same tools, against their will.

Bob doesn't write a specification, because Bob doesn't know as well what he wants. He will have to explore, try out until he reaches something that he can work with. Nobody is willing to spend time planning and documenting stuff. Everyone feels one youtube away from being expert in software development.
With this argument, Computer Science wouldn’t have progressed beyond assemblers.
I sort of see your point. if you’re not willing to at least try to learn something new then, yeah, probably better off doing something else.
This is elitist and frankly, unhelpful. The answer to a skills shortage is not a practitioner lockdown, but policy, training, guidance and mentoring. If you're stuck in start up land and you have this issue, you have hired the wrong skills. If you're encountering this in enterprise land, your organisation, and potentially you depending on your position of influence, should be angling to improve compliance and literacy not through obstruction but through policy and upskilling. Failing to do so will kill your ability to innovate.
FWIW, while I disagree with the parent comment, I don't see you arguing against it.

They actually implied that you should try upskilling first — but if that fails, you shouldn't be doing ETL yourself.

I mostly disagree with the parent comment because there's so many things one can easily do up to a level, and then when the going gets tough, you need to call in an expert. Eg. most people can operate a screwdriver or impact driver to fix things, but to fix some problems, you really need a trained technician (or well, an experienced DIY person, but that's not everybody).

The fact that you are not strong enough to screw in an M14 bolt does not mean you should be forbidden from using an impact driver: tools are there to help you. The logic of the parent comment was seemingly that if you are not strong enough to tighten an M14 bolt, you probably don't know what you are doing regardless of the type of the bolt you are tightening, so you should simply not do it.

The point I agree with in a parent comment is that not everybody can achieve a similar level of proficiency: while upskilling and improving/simplifying tools can get you most of the way there, there's always going to be that extra bit that requires a sudden, sharp jump in knowledge, smartness or experience to be able to deal with it.

I’m all for upskilling.

On your example, when I was an intern in a factory, I was banned from the pneumatic tools with a counter grip. Because if you use them incorrectly your finger/ hand / arm is a flesh pancake.

My solution suggestion here is definitely to empower more people to work on this by teaching them basics of data base design and enough python to write a dag in airflow.

>This is elitist and frankly, unhelpful. The answer to a skills shortage is not a practitioner lockdown, but policy, training, guidance and mentoring.

I think the point is that these tools have their own learning curve, and non-tech business people are not doing it well, either; how much different is it from learning SQL? Which one is more broadly valuable and transferrable as skill?

If this is the career you want (data or data-adjacent), why not just learn SQL? There are far more learning resources and the value of the knowledge will assuredly outlast any low-code tool.

Skills shortage?