Hacker News new | ask | show | jobs
by boringds 1929 days ago
I think you nailed it. Often companies and exec want ML but don't have the basics: robust ETL pipeline, clean data, solid analytics foundation (dashboards, automated reporting, etc.). These appear boring to most but they will be the difference between a useless ML department who can't ship anything to production and a successful one that builds on top of the aforementioned foundations.

In addition I believe it's time we drop the data science term. It's an umbrella of different roles ranging from data engineer to DL researcher. Companies need to identify what they REALLY need and not go for the shiny PhD in ML.

The emergence of analytics engineering is the perfect example of this shift towards creating robust data pipeline first and enabling "data scientists" to do so.

I wrote a blog post about it yesterday, I don't want to post it here and self-promote too much, so check it in my profile if you want to.

3 comments

Here's boringds's post, for anyone else who's curious: https://boringdatascience.com/post/data-science-is-dead-long...
I love the idea of "boring data science." I will steal that term for my own use.
Who do you think has the best 3rd party solutions for data cleaning?
Data cleaning is domain specific. Hire someone to do it and accumulate wisdom over time.
For data cleaning I swear by dbt (https://www.getdbt.com/). It's such a powerful tool that you can put in the hand of anyone with SQL knowledge. It allows us to develop clean pipelines, document, and test our data easily. It's also free and the team working there is amazing.