|
|
|
|
|
by Longwelwind
3289 days ago
|
|
In the decision function of an SVM, you compute the scalar products of the support vectors (points that are on the margin of your hyperplane, or more precisely, the points that constrain your hyperplane) and your new sample point: x· sv
The "z" the article defines is a new component that will be taken into account in the scalar product. A more mathematical way of seeing that is that you define a function phi that takes an original sample of your dataset, and transform it into a new vector. In our case, we simply add a new dimension (x3) based on the two original dimensions (x1, x2) that we add as a third component in our vector: phi(x) = [x1, x2, x1² + x2²]
The scalar product we will have to compute in our decision function can then be expressed as (this is the a and b in the article, i.e. the sample and the support vector in our new space): phi(x)· phi(sv)
The SVM doesn't need phi(x) or phi(sv), but the scalar product of those two numbers. The kernel trick is to find a function k that satisfies k(x, sv) = phi(x)· phi(sv)
and that satisfies the Mercer's condition (I'll let Google explain what it is).Your SVM will compute this (simpler) k function, instead of the full scalar product. There are multiple "common" kernel functions used (Wikipedia has examples of them[1]), and choosing one is a parameter of your model (ideally, you would then setup a testing protocol to find the best one). [1] https://en.wikipedia.org/wiki/Positive-definite_kernel#Examp... |
|
And if I am following correctly, it would make sense that the final step would then be:
We would maximize the dot product of a new observation with the support vectors to determine its classification (red or blue)