So we had this idea of a new feature for our product. The only way to quickly do it was to somehow implement a machine learning algo and that would give us the result that we wanted. Viola!! It seemed simple.
Now our company doesn't have any machine learning expert or a data science genius. Going for hiring one would take time. Taking someone up on contract would be very expensive (our CEO wasn't ready to shell out that kinda money). So the task fell on me. They asked me to go through the multitudes of Machine leaning MOOCs out there and get a working prototype ready in 2 weeks.
I had already done Andrew Ng's course back when it came out for the first time. But my memory had faded for the lack of practice.
I re-ran the course again. I went over a couple of online ML books too.
Then I started thinking of the problem at hand. Unfortunately, it turned out to be a chicken and egg problem. For the feature to work perfectly we needed a large amount of training data to train our models.
But without the feature actually deployed, we didn't have any way to collect any training data.
So we ultimately fell back to simple algo, that took it's decisions based on a few hard coded rules. Things have been working fine till now.
They gave you two weeks to become a data scientist and implement a working solution? That's nuts. I'm still pretty early career, but I have done data science work for about four years now and I wouldve quoted at least two months to figure out data, clean it, feature engineer, run models, compare results, and then deliver the best performing solution.
No data is better than 10 years of useless data. I’d much rather be in the position of designing the data collection (experimental design ftw) than trying to fix the problems with an overly complicated modeling project. Buuut, I am a statistician.
In my experience, having someone that knows what they’re doing on the front end of a study design wise can save weeks or months of work on the back end of a study or project.
> They gave you two weeks to become a data scientist and implement a working solution? That's nuts.
Oh c'mon. Any large company today and the expectation or deadline for practically anything is "asap" or measured in a few weeks at most. Short-term thinking is a major player in publicly traded companies. Because of that, this is what opens the door for startups to play the long-game.
> Unfortunately, it turned out to be a chicken and egg problem. For the feature to work perfectly we needed a large amount of training data to train our models. But without the feature actually deployed, we didn't have any way to collect any training data.
Everyone outside of data science seems really surprised by this and I can't count the number of times someone has asked me to build an algorithm for X but has none of the data to support doing so. It doesn't mean the feature/product can't be built but they often want a supervised learning solution without the cost (and time) of acquiring the ground truth data.
> The only way to quickly do it was to somehow implement a machine learning algo and that would give us the result that we wanted.
Since no one of you had any experience with ML, how did you know that a ML algo (which one?), implemented "somehow" would give you the results you wanted? (Not a cynical comment; I am really interested in hearing about this).
Co-author here. This is a surprisingly common situation. In fact starting with the simplest algo is usually the best way to prove the validity of your approach, and gather initial data to build a more complex model later.
In addition, trying for the feature to “work perfectly” from the get go, even with lots of data usually is quite hard.
Maybe it's an instance of "when all you have is a hammer...", because I'm learning about it right now, but you could look into transfer learning - you train a ML model in a similar, easier task, and then you tweak it with your data.
That said, there's a good chance that your current algorithm is all you will ever need - many times a ML project is too much, and you already have good results.
Transfer learning only works if the original model is in the same domain (e.g. ImageNet for images, GloVe for text). A bespoke problem likely won't have a widely-available original model.
That seems fine to me. It's a good practice to start with hard-coded business rules instead of any kind of model, just to test the waters, collect some data, and see if a new feature even makes sense, before diving into building even the simplest model.
I've been talking to academic neural net / ML experts in computer vision and OCR / NLP and the thing they try to stress is that for almost all cases an algorithmic approach works better.
I don't think most ML experts would agree with that, a big reason DL became popular are the huge improvements they brought to CV and NLP fields.
In many ways, traditional approaches were harder because you need huge amount of domain expertise in CV & NLP, whereas a ML expert can solve simple CV problems with almost no domain knowledge.
Now, a lot of the business data, especially time series data, I agree that an algorithm/heuristic approach is easier and more robust. E.g. recommendation systems.
yes, but before the ML step the old approaches relied on expert-crafted features. The breakthroughs in those fields via deep-learning is because people found architectures (CNN/RNNs) that could learn those features much, much, much more efficiently than they could be hand-crafted.
Machine Learning is much more nuanced than people seem to understand. You can't just throw data at a net and expect results-this field requires a heavy degree of intuition, and engineers must be prepared for nets to pick up on patterns not obvious to humans, which can lead to unintuitive results.
Neural nets are basically black box heuristics, with unpredictable edge cases. Much like human reasoning, I'd warrant!
Co-author here. This post came out of a discussion with Adam, where we both realized that the advice we were giving to ML teams and ML Engineers to guide them to better results were very often process centric rather than model centric.
Many resources exist online about how to get a model to converge, and that’s not usually what makes or break a project.
Data acquisition, augmentation, model selection, and iterative exploration however seem quite rarely discussed compared to how important we have seen them be. This is our attempt at sharing this outside of our usual circles.
Novel per se means nothing, for business the more the standards the better. In ML/DL for b2b we badly need unified best practices and, above all, sensitivity (ablation) protocols to demonstrate our models are neither overfitting nor cherrypicking.
I hate quotes but there's a single one I'll ever use because it's not only accurate but incredibly useful: "People need to be reminded more often than they need to be instructed."
So we do the loop 50 time and we now have an algorithm that works (97%!) on the test set. We are happy! We run it in production and everything looks good (prbly 92% ish). Everyone is happy! We all get promoted or get new jobs. Then, one day, someone actually looks at what it's doing... and lo. It. does. not. work (~51%) Everyone is sad. Apart from us! Yay!
Seriously - an optimisation loop on a test set? Seriously?
The point about hacking away at the code needs to be couched heavily. It's too easy to conclude you've got negative or positive results when what you really have is a silly little bug. The lack of focus on implementation skills in data (or even "real" science) is frightful. The one take away anyone trained in software engineering could share is that if you aren't very sure if it is working as intended, it's very likely not. Code review is very applicable here when making major pivots, even if unit or other testing is decidedly too time consuming for the train test improve loop.
Edit: typo "of" to "if". Somewhat serendipitous if you think about it.
Well, “Data Scientist” has been appropriated by the overflow of PhD’s w/o any actual stats or computational backgrounds and few academia prospects, so I guess you need to create new job titles for thise who are going to do the actual work.
I totally agree, and wasn't arguing that a new title wasn't necessary. And I'm ok with my downvotes for that comment :)
It's just funny that "Data Scientist" seemed to be originally branded as the more technical/engineer-y version of a data analyst. Now I get recruiters contacting me for "Data Scientist" positions that entirely revolves around SQL and excel, and nobody in the Bay Area hires "Data Analysts" anymore.
Alright, guess it's time to update my LinkedIn and resume to adjust for this inflation? Maybe I should jump up a few inflation levels and just become a "Deep Learning Engineer."
I do not see any problem with that. There is a ton of confusion in the tech world regarding labels, who does what, it is needed or not, outside of the core actions that need to be done. The net effect of laying off 50% of tech people from public tech companies might even result in a net positive for the companies. Not for a tech worker like me, so please do not tell them.
Taking advantage as much as possible of hypes and other people's lazyness is fine in my book. It is certainly not my duty from the outside to educate recruiters and business people who make hiring decisions on the field – when I tried, from the inside, to gently point out that what they were thinking did not make any sense, I just put myself in a dangerous spot. I can be a data scientist, deep learning engineer, machine learning engineer, machine learning research scientist, whatever pays more and whoever has the most fun. If using an RNN instead of a more effective and efficient linear regression gives me more money and prestige, I will do it – as an IC you either go with the flow or you are not having a good time. The vast majority of us is not saving lives anyway.
Now our company doesn't have any machine learning expert or a data science genius. Going for hiring one would take time. Taking someone up on contract would be very expensive (our CEO wasn't ready to shell out that kinda money). So the task fell on me. They asked me to go through the multitudes of Machine leaning MOOCs out there and get a working prototype ready in 2 weeks.
I had already done Andrew Ng's course back when it came out for the first time. But my memory had faded for the lack of practice.
I re-ran the course again. I went over a couple of online ML books too.
Then I started thinking of the problem at hand. Unfortunately, it turned out to be a chicken and egg problem. For the feature to work perfectly we needed a large amount of training data to train our models. But without the feature actually deployed, we didn't have any way to collect any training data.
So we ultimately fell back to simple algo, that took it's decisions based on a few hard coded rules. Things have been working fine till now.