Hacker News new | ask | show | jobs
by denzil_correa 3260 days ago
> I've learned so far is that the data behind your ML code (and the way it is structured) is responsible for almost all the success or failure of any given ML algorithm

Data is indeed a necessary condition but certainly not sufficient. You require a good marriage between engineering features and data to have a good success rate. Learning curves [0] are a good way to understand if your ML algorithm requires more data or better feature engineering.

[0] http://mlwiki.org/index.php/Learning_Curves

1 comments

Much of the programming with ML has moved towards cleaning, extrapolating and generating the data.

But this type of programing is - miracles- bugfree. We never hear of data-conversion gone wrong, data corrupted or data-mining withou conclusive results here. Obviously such bugs lack the glamour of security bugs.

It's also very difficult to catch these errors. Your trained model just doesn't work as well as it could, but how would you be able to tell?