| HN Mirror

you've actually probably done this yourself. it's often called "featurization". for example, instead of applying a linear learner on vectors x in R^d, you apply it to vectors f(x), where f computes a bunch of features on x. a popular choice for f are the d-th order monomials. hashing families are another good idea (Alex Smola does this). more generally, any random nonlinear function f is a good candidate (i call that analysis "Random Kitchen Sinks"). when x is structured data, f usually just returns counts in histogram bins of some kind.