Hacker News new | ask | show | jobs
by alok-g 2 days ago
Thanks. I read several times, and along with another response, I think I have a better understanding now, though still not having a complete grasp.

>> So sampling one point gives us the gold amount for cheese amount 1, 2, and 3. This is the 'function', and ...

I get this part, so each point in this N-dimensional space yields a function f of the index, and this is the function.

>> Yes, the function changes shape as you get more data because the parameters governing that function

Getting more data should now get more such points (in N-dimensional space), but with each such point being the 'function' how is it changing shape.

Nevertheless, I think I have much better glimpses after reading your and other other responses here than from the original article, which I still find confusing even on reading again.

1 comments

I said before that the function shape changes as you're updating the parameters that govern the function but that's actually very misleading, (sorry), since the kernel parameters are only indirectly governing the function. What the parameters directly govern is the joint probability distribution P(f(x1), f(x2), ..., f(xn)). So the function f is implicitly defined by how likely the entire sequence of f values are.

So how does it change shape? Well this part is actually something I don't fully grasp myself yet. But I can sketch a crude bayesian interpretation, which is how I think of it. Not completely correct but works as a placeholder until I fully work out the math of updating the parameters.

Basically, from a bayesian perspective we can condition the joint distribution of function outputs as a likelihood conditioned on the kernel parameters theta: p(f(x1), f(x2), ... | theta).

Then we can derive the posterior distribution over theta p(theta | f(x1), f(x2), ...) like so:

p(theta | f(x1), f(x2), ...) ≈ p(f(x1), f(x2), ... | theta) p(theta).

So we fit the theta parameters based on how well it fits the observed data we feed our bayesian model.

FWIW, I recommend chapter 14 of Richard McElreath's Statistical Rethinking for a better introduction of GPs. This article kind of glosses over a lot of the intuition and introductory concepts that you need to really grok it.