|
> Moreover, you may quickly realize much of this work is repetitive and while time-consuming, is “easy”. In fact, most analyses involve a great deal of time to understand the data, clean it and organize it. You may spend a minimal amount of time doing the “fun” parts that data scientists think of: complex statistics, machine learning and experimentation with tangible results. This. Universities and online challenges provide clean labeled data, and score on model performance. The real world will provide you... “real data” and score you (hopefully) by impact. Real data work requires much more than modeling. Understanding the data, the business and value you create are important. As per #6, better data and model infrastructure is crucial in keeping the time spent on these activities manageable, but I do think they’re important parts of the job. I’ve seen data science teams at other companies working for years on topics that never see production because they only saw modeling as their responsibility. Even the best data and infrastructure in the world won’t help if data scientists do not feel co-responsible for the realization of measurable value for their business. Training integrative data professionals could be a great opportunity for bootcamps. Universities will (understandably) focus on the academically interesting topic of models, while companies will increasingly realize they need people with skills across the data value chain. I know I would be interested in such profiles. :) |
Most people figured that with such a simple assignment (not significantly harder than the first one, which was also easy-ish) they could put off doing it until the last moment.
Most people failed.
This real world data needed hours upon hours of cleaning before it was in any way useable. Of course, the teacher knew this, gave bonus points to the ones who did start in time, and then extended the deadline as he had expected to from the start.
Never again will I underestimate the dirtiness of real world data. One of the best teachers I had.