Hacker News new | ask | show | jobs
by syntaxing 1727 days ago
I have mix feelings. On one hand, this is pretty awesome in the sense more people should have access to different ML easier. But on the other hand, I’m guessing the people who can’t pipe a csv to pandas + scikit will probably neglect important steps like data processing and output interpretation like balanced accuracy vs accuracy.
3 comments

I like to say "the dollars are in the data". Tools like these do open up ML to a wider audience but it does nothing for them to understand that nothing beats proper record keeping and data collection. Furthermore, it handicaps you in certain ways. For example, how are missing values handled?
Totally agree, getting actionable and confident results from ML is a really tough thing to do. Imputing data, feature processing, metric choices, and etc is almost an art form (though new AutoML libraries have made this much easier).
Totally understand your mixed feelings. We are trying to do as many standard data preprocessing steps as we can (cleaning, normalizing, one-hot encoding etc.) and are now switching over to an AutoML engine. We basically see the outputs of Magicsheets as great baseline models, Data Scientists could definitely do better (especially with domain knowledge) but we should at least be able to give some useful predictions back for most problems :).
> I’m guessing the people who can’t pipe a csv to pandas + scikit

People who know how to do that neglect the important steps too. The hardest part about ML is finding practitioners who aren't false positives.

I hear this a lot but I haven't seen too much of it in the real world. In my experience it's a competitive field to get into so even the junior people are pretty decent. They have to be.