|
|
|
|
|
by srean
3940 days ago
|
|
Could you elaborate on its differences from the kernel trick. It seems to me @xtacy is exactly right on that modeling point. Training the model using scale appropriate optimization is an orthogonal concern, if that is what you were getting at, i.e. an off the shelf kernelization would not be sufficient to run this at scale. |
|
The kernel trick uses an implicit mapping into a higher-dimensional feature-space. On the other hand, Factorization Machines uses an explicit mapping into the polynomial kernel space. However it learns jointly the "right" polynom, by mapping the base features into a low-dimensional dense space where the higher-order terms of the polynom are dot-products (we're not taking dot-products in the original feature-space!). FM then learns the right polynomial (a non-convex taks) jointly with learning the original supervised learning task.