| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by syntaxing 1727 days ago
	I have mix feelings. On one hand, this is pretty awesome in the sense more people should have access to different ML easier. But on the other hand, I’m guessing the people who can’t pipe a csv to pandas + scikit will probably neglect important steps like data processing and output interpretation like balanced accuracy vs accuracy.

3 comments

hervature 1727 days ago

I like to say "the dollars are in the data". Tools like these do open up ML to a wider audience but it does nothing for them to understand that nothing beats proper record keeping and data collection. Furthermore, it handicaps you in certain ways. For example, how are missing values handled?

link

syntaxing 1727 days ago

Totally agree, getting actionable and confident results from ML is a really tough thing to do. Imputing data, feature processing, metric choices, and etc is almost an art form (though new AutoML libraries have made this much easier).

link

janhenr 1726 days ago

Totally understand your mixed feelings. We are trying to do as many standard data preprocessing steps as we can (cleaning, normalizing, one-hot encoding etc.) and are now switching over to an AutoML engine. We basically see the outputs of Magicsheets as great baseline models, Data Scientists could definitely do better (especially with domain knowledge) but we should at least be able to give some useful predictions back for most problems :).

link

dreyfan 1727 days ago

> I’m guessing the people who can’t pipe a csv to pandas + scikit

People who know how to do that neglect the important steps too. The hardest part about ML is finding practitioners who aren't false positives.

link

jstx1 1727 days ago

I hear this a lot but I haven't seen too much of it in the real world. In my experience it's a competitive field to get into so even the junior people are pretty decent. They have to be.

link