Hacker News new | ask | show | jobs
by nonameiguess 1899 days ago
(Some) ML is non-parametric, but there are always some questions you need to be able to answer about your data. At bare minimum, is the generating process ergodic, what is the error of your measurement procedure, how representative of the true underlying distribution is your sampling procedure? All use of data should start with some exploratory analysis before you ever get to the modeling stage.

Once you have a model, at minimum understand how to tune for the tradeoffs of different types of error and don't naively optimize for pure accuracy. At the obvious extremes, if you're trying to prevent nuclear attack, false negatives are much more costly than false positives, if you're trying to figure out whether to execute someone for murder, false positives are much more costly than false negatives. Understand the relative costs of different types of error for whatever you're trying to predict and proceed accordingly.