Hacker News new | ask | show | jobs
by mattnewton 3574 days ago
I'm trying to switch careers into "Data Engineering" now, as a full stack developer who is more interested in ML, and I've found almost no traction internally at my company or externally. It looks like I may just accept a full stack position at a good company that does a lot of data science for now, but though I would ask - Where are all these jobs?
7 comments

"Data Engineering" is most of the work that needs to be done, but I think companies haven't identified it as a category.

From my P.O.V., "Full Stack Engineer" is a place you don't want to be because it means putting out fires with whatever junk javascript is in the front end. It seems like everybody who's built a serious javascript application has invented their own Virtual DOM because none of the popular Virtual DOM libraries are good for much other than wasting time and CPU cycles.

"Data Scientist" is a bad title in it's own way, in the sense that "Computer Science" is bad, but worse. To a lot of people there is a Brahmin kind of attitude associated with "Scientist" -- i.e. an aversion to getting your hands dirty. Real world data is pretty dirty and you aren't going to get far in getting value out of it unless you spend 80-90% of your time dealing with the dirt.

There are "Full Stack Engineer" doing pure native applications, which is what I have been doing the last three years after escaping the web back into native land.
You are correct. I thought full stack meant before building the app start to finish, but the reality is often closer to putting out other people's fires in every layer. It does pay well though and you learn a lot of what can go wrong.
The fact that it pays well makes it a job you're likely to get laid off from. Most managers would rather hire two junior developers so they can screw it up faster or better yet hire some people in another country who are really fast and cheap at screwing it up.
That may be true but I'm not worried about that, I worry more about getting comfortable doing useless work. If I got fired it would be so much easier to go back to school, as the dream of lots of money while learning on the side would evaporate.
My official title is "Data Scientist" although I'm closer to the "ML Engineer" someone else mentions in a child comment.

Frankly speaking, if your company doesn't need a data engineer, it won't hire one or move you into that role. They likely don't, either, if you're experiencing this pushback -- data engineers often develop ETL pipelines or data warehouses, both of which are very useful if your company has a data team and very useless if it does not.

That said, you may want to move closer to my role. There's actually a shortage of data-savvy people who can also write production software, and you would nicely complement a more research-inclined data scientist or analyst -- someone with far more experience with research/analysis than development.

> There's actually a shortage of data-savvy people who can also write production software, and you would nicely complement a more research-inclined data scientist or analyst -- someone with far more experience with research/analysis than development.

I experience the same problem with shortage-at-price-X in the field you describe. I'm a machine learning engineer with experience in MCMC methods, but I also have a lot of low-level Python and Cython experience, some intermediate experience with database internals, and lots of experience writing well-crafted code for production systems.

There are basically zero companies willing to pay what I'm seeking (which is a salary based on my previous job and a few offers I got around the time I took that job). In fact, in some of the more expensive cities, the real wage offered is far lower than other markets.

I've seen reputable, multi-billion dollar companies offering in the $140k range for this type of role in New York. That's wildly below anything reasonable for this sort of thing in New York. I've seen companies in Minneapolis offering $130k for the same kind of job -- and even that is still too low for Minneapolis! The same has been true in San Francisco as well.

Because these companies value you more for simply looking good on paper and looking good as a piece of office ornamentation when investors stroll through, and they view you as an arbitrary work receptacle closer to a software janitor than a statistical specialist, their whole mindset is about how to drive wage down.

Frankly, given the stresses of the job and the risk of burnout, I think it's actually a terrible time to be in the machine learning / computational stats employment field, despite all of the interesting new work and advances being made. The intellectual side is good, but the quality of jobs is through the floor.

"I've seen reputable, multi-billion dollar companies offering in the $140k range for this type of role in New York. That's wildly below anything reasonable for this sort of thing [in NY/SF"]

Man, do I ever agree. This is where the "shortage" argument falls apart.

This is why I'm so uninterested in the abstract arguments happening elsewhere on this topic about whether markets are failing and basic laws of supply and demand no longer apply at theoretical salary levels (10 million was offered as an example).

Why are we bothering with this debate, when it's so far from reality? I'd say that if you're trying to hire a very high skilled and critical tech worker in SF, and you just can't find one no matter how hard you try, and then I find out that you're only offering 140k a year?

In San Francisco and New York (and anywhere else in the US, really), that's nowhere close to the kind of pay where we should start scratching our heads about a shortage and start wondering why the usual laws of supply and demand aren't working anymore.

Yeah, I strongly believe companies haven't (or aren't willing to) figure(d) out the IC track problem for data people in the way they've figured it out for engineers. Part of me wonders if it even makes sense for them to figure it out, if they're not an Uber/Netflix/Amazon with a strong need for advanced ML abilities.

It sounds like you're a principal/lead/post-senior ML engineer; at that level, you can easily command more than $140k but you have fewer options to apply those skills at companies that really need them (because few companies actually need them).

I don't know. It's tough. I agree that it might be a terrible time to work in ML/computational stats because of stuff like this.

I suspect the reason is those companies offering $140k frankly don't need that level of expertise. With that kind of background it would be fairly easy to get 200-300k as an infrastructure engineer at a quant shop.
Oh, also: if you're in NYC I'd be happy to meet over a coffee/beer to swap stories. Feel free to use the contact info in my profile.
I think the company does need data engineers but wants someone with a graduate degree from Stanford or CMU in that position, even though the actual work is in building up infrastructure for those people. And I understand. I've only really got software engineering skills to contribute at this point and I'm picking up the ML from kaggles on the side; I am looking for a position that can increase my overlap between those, because learning at home while working on unrelated stuff is making me move slowly and painfully. Your experience sounds exactly like what I'm looking for - data-savvy writing production code, complementing a research-heavy team I can learn from. How did you get started in that?
I honestly fell into it by luck. I moved to NYC, studied machine learning in grad school, networked my ass off, and landed an internship.

From there I went full time as something of an ML engineer at a company with a strong tech culture, and learned as much as I could in both tech and ML/statistics. The rest is history (although I'm by no means a rockstar or whatever).

My path is hard to reproduce -- it starts with being in NYC or SF at a specific point in time, before the labor market became saturated with data science bootcamps and PhDs furiously learning Python while working on their dissertations.

Your best bet at this point is to produce a few data-related projects (maybe work on open source like scikit-learn and pandas?) and network like crazy. Someone somewhere will have a need for someone like you.

Thanks! I guess it's somewhat reassuring that it's hard to break into for everyone and I'm not just dumb :) I'll keep kaggin'
>There's actually a shortage of data-savvy people who can also write production software

Well no kidding, that's one person doing two jobs. That's easily a 5-10 year training time depending on how high a quality you demand from their production software.

We (Kaggle) run a data science jobs board (https://www.kaggle.com/jobs) that gets a few data engineer listings from time to time. Not all of these are active, but you may find a few interested companies via - https://www.google.com/#q=site:https://www.kaggle.com/jobs+%...
Thank you guys! Doing Kaggle competitions is what got me interested in seriously pursuing ML in the first place. You are all seriously awesome.

I'll look again at the board but, I didn't see anything there before that wanted software engineering skills (which I have with industry experience), and not a graduate degree (which I don't), and happened to be commutable from my place just south of the bay. But I will keep looking!

I see tons of them. If you're interested in ML, you're probably more looking towards data science. Data engineering (in general) is more about getting the data in a state where it can be used (extracted, cleaned, moved, transformed, etc.) at least from what i've commonly seen in the industry. A decent breakdown is here: https://blog.insightdatascience.com/data-science-vs-data-eng...
You might want to look at "Machine Learning Engineer" positions if you want to do ML in practice, it's starting to be a title I see somewhat often now.

As others have pointed out Data Engineering is more about building data pipelines, making architecture decisions for your ML stack, things like that. Less about model building, prototyping and training, which is what I think of when somebody says they 'do' ML.

Right, I'm not picky about the title. I'm looking at those positions too. The main thing is, I want to be able to contribute using my existing software engineering skills from day 1, while picking up the ML stuff. It's been really hard to basically work an unrelated job during the day and go home and do kaggles for practice, so I am hoping to get more of an intersection as a launching place. Anything touching the data or the models will do :)
ML falls more under a Data Science role than Data Engineering, although ML is much more difficult without proper Data Engineering.
You should put your email in your profile. If you're in Seattle, send me an email.