|
|
|
|
|
by sparsevector
5173 days ago
|
|
The set of support vectors is just the set of training examples that have non-zero alpha parameters. To implement the gradient update you just evaluate the support vector machine on the example (using the explicit kernel expansion) and then if the example has signed margin less than 1 you add y * eta to the corresponding alpha value. The difficulty with Pegasos for non linear kernels is the support set quickly becomes very large and so evaluating the model becomes very slow. Note that since the alpha values are not constrained to be non-negative (unlike the standard dual algorithms) the alpha values don't ever get clipped to zero--instead they just slowly converge to zero. It's still (I think) one of the fastest methods in terms of theoretical convergence guarantees but perhaps not as fast as LaSVM or something similar in practice. However, there's been a more general trend in machine learning to use linear models with lots of features instead of kernel models, partially because of these sort scalability issues. |
|
http://www.mblondel.org/journal/2010/10/31/kernel-perceptron...
However it seems that you need to compute the kernel expansion on the full set of samples (or maybe just the accumulated past samples?): this does not sound very online to me...