| These types of posts validate my concern about the people entering my field right now. Data science, as a line of work, is distinct from other technical roles in its focus on creating business value using machine learning and statistics. This quality is easily observed in the most successful data scientists I've worked with (whether at unicorn startups, big companies like my current employer, or "mission-driven" companies). Implicit in this definition is avoiding the destruction of business value by misapplying ML/statistics. In that sense, I am concerned about blog posts like these (which list 50 libraries and zero textbooks or papers) and those who comment arguing the relevance of "real math" in the era of computers. Speaking bluntly: if you are a "data scientist" that can't derive a posterior distribution or explain the architecture of a neural network in rigorous detail, you're only going to solve easy problems amenable to black-box approaches. This is code for "toss things into pandas and throw sklearn at it". I would look for a separate line of work. |
–Medium Stats/ML, medium Engineering ("Data Scientist" or "Data Engineer")
–High Engineering on very large datasets, low/medium Stats/ML ("Data Engineer" or "Backend Engineer")
–High Analysis, medium Stats/ML, low Engineering ("Analyst")
–High traditional Stats, High Analysis, low ML/Engineering ("Statistician")
–High ML, medium Stats, medium Analysis ("Data Scientist")
–High ML, medium Engineering ("Machine Learning Engineer")