Hacker News new | ask | show | jobs
by axg11 1554 days ago
Good overall post but there is some conflation with machine learning. The issue with conflating the two is that, in practice, many engineers want to do ML work but few want to perform analysis work.

One additional factor for success:

Hire with the right expectations. If most of the value from the role comes from data analysis, communicate that. Often ML/data roles are hired by dangling the carrot of developing and deploying complex machine learning products. ML is sexy, a lot of managers want to manage ML projects and a lot of engineers want to work on them. In reality, most teams need someone that is good at SQL and can code a simple metric/heuristic. It’s also important to communicate to the team that shipping simple solutions and simple analysis is a great outcome. Your analysis showed that you can achieve 90% of the initial goal with a simple if/else on one metric? Great! Deploying and maintaining ML models is hard and should be a last resort. I’m saying that as someone whose entire career depends on complex machine learning models.

2 comments

In my experience, ironically, most of the model gains come from understanding and fixing data pipelines and datasets, tokenizers, vocabs. It’s surprising how a team can spend time on a complex model, but nobody bothered to runs stats and see that 20% of samples are garbage or that top tokens are nonsense. So in this sense a lot of “ML” work is data analytics or code debugging. I usually say that we should work on products, and do whatever work is required to advance product at the moment.
Yeah, I absolutely agree with a common tactic being dangling an ML carrot at recruiting but the work not even being ML related.

I've historically seen the ML team be separate from Data Science/Analytics, I wonder if that helps with this

I think it's reasonable as an early-ish stage startup to say to a candidate that ~"eventually, with scale, there will be cool and impactful ML opportunities here" as long as you're realistic and upfront about the facts that ~"right now most of the impact is in simple but foundational analyses" and ~"there'll be some amount of fires to put out and rote work to automate".