| I have already explained in the post above why optimization is very a much a part of machine learning. Now, you say that that classification and optimization are two different things and that is true. But really, it can be more fruitful to look at supervised learning as a special case of unsupervised learning [1]. It is best to seek the most general framework from which to understand things as it leads to a deeper understanding, broader applicability of concepts and easier cross fertilization across fields. For example, understanding the spectral theorem makes SVD (hence PCA) and the DFT class of algorithms much clearer. Understand the notion of Lp-Norms, convexity, adjoints, loss functions and regularization and a whole bunch of seemingly different algorithms collapse into facets of the same thing. Hook it up to automatic differentiation then some optimization algorithms and you can write anything from Neural networks, SVMs, regularized logistic regression to Non negative tensor factorization in a few lines. You stop making arbitrary divisions between classification or optimization. Much the same kind of collapse can be done for the dual [2] notion of probabilistic algorithms by thinking in terms of graphs, simplices, parametrizations, families and conjugacy. The best thing from all this is you stop thinking of which algorithm should I use and start thinking of what do I want to do? What is the best mathematical model for this? What would really be great would be a machine learning language. Where one could work with things akin to folds and maps on various structures and manifolds and disappear the incidental complexity. Stuff like [3] is really encouraging for that direction. [1] The problem of learning a distribution usually is called unsupervised learning, but in this case, supervised learning formally is a special case of unsupervised learning; if we admit that all the functional relations or associations that we are trying to learn have any element of noise or stochasticity, then this connection between supervised and unsupervised problems is quite general. http://www.princeton.edu/~wbialek/our_papers/bnt_01a.pdf [2] http://golem.ph.utexas.edu/category/2007/01/duality_between_... [3] http://www.ipam.ucla.edu/publications/gss2012/gss2012_10605.... |