| HN Mirror

I said before that the function shape changes as you're updating the parameters that govern the function but that's actually very misleading, (sorry), since the kernel parameters are only indirectly governing the function. What the parameters directly govern is the joint probability distribution P(f(x1), f(x2), ..., f(xn)). So the function f is implicitly defined by how likely the entire sequence of f values are.

So how does it change shape? Well this part is actually something I don't fully grasp myself yet. But I can sketch a crude bayesian interpretation, which is how I think of it. Not completely correct but works as a placeholder until I fully work out the math of updating the parameters.

Basically, from a bayesian perspective we can condition the joint distribution of function outputs as a likelihood conditioned on the kernel parameters theta: p(f(x1), f(x2), ... | theta).

Then we can derive the posterior distribution over theta p(theta | f(x1), f(x2), ...) like so:

p(theta | f(x1), f(x2), ...) ≈ p(f(x1), f(x2), ... | theta) p(theta).

So we fit the theta parameters based on how well it fits the observed data we feed our bayesian model.

FWIW, I recommend chapter 14 of Richard McElreath's Statistical Rethinking for a better introduction of GPs. This article kind of glosses over a lot of the intuition and introductory concepts that you need to really grok it.