Hacker News new | ask | show | jobs
by rakmial 3478 days ago
Actually there's a very clear definition of what types of problems ML ought be used for, and that category of problem is what defines it. Those familiar with regression (and stats in general) ought to be familiar with it already - it's an issue of relationship of datatype between independent and dependent variable.

In brief, you're going to run up against two types of data - categorical and continuous. (There are facets to this, eg ordinal, but these are really the elemental types of data). The relationship of datatype to independent/dependent variable is what determines what kind of analysis you may conduct.

Categorical Independent vs. Categorical Dependent, for example, is fairly restrictive, as makes logical sense. You may cross-tabulate, you may score likelihood based on previous observation, but obviously, because all of the data involved are non-numeric, there's no chance for regression, ANOVA, etc. Linear Regression is used when both independent and dependent variables are continuous, and cross-category differencing techniques like ANOVA may be used when the independent is categorical and the dependent is continuous.

The part you don't typically learn until grad school is when the independent is continuous and the dependent is categorical, ie, in ML, a classification problem. The standard statistical methods used as foundation for these problems are logistic regression, logit/probit. It's expansion of these methods that lead to ML in the first place.

1 comments

If I'm reading this correctly, it's just wrong. Whatever the distinction between data analysis and ML might be, it is more than just whether your data and predicted quantities are discrete or continuous.

> Categorical Independent vs. Categorical Dependent, for example, is fairly restrictive, as makes logical sense. You may cross-tabulate, you may score likelihood based on previous observation, but obviously, because all of the data involved are non-numeric, there's no chance for regression, ANOVA, etc

If you are implying that categorical -> categorical predictions are not ML: as a counter example, natural language is a categorical (words) input that could be used to predict any number of categorical variables (parse trees, semantic categories, etc). I think it's safe to say that the field of NLP is doing machine learning.

Thanks for the sanity check. I read that reply, and got bogged down enough that I was worried my initial reaction of "what, that's not relevant!" was born of ignorance. Discrete/continuous is a distinction worth making, but as a hidden 'definition' for ML I really don't understand it.