| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by magicalhippo 3 days ago

I was similarly confused, but after a few rounds with Gemini 3.5 Flash (extended) it cleared things up some, for me anyway.

> What is 'x' here?

So as I understand it, a Gaussian Process is defined in terms of a set of random variables which are indexed, typically by either time (t), or space (x). So in the concrete example, x here would be the amount of cheese inserted into the magical machine. In general the "index" can be a vector. Say if the magical machine instead required inserting both cheese and milk to produce some amount of gold, the index x would be two-dimensional, to represents the various amounts of cheese and milk you inserted.

> It does not seem the 'f' here is intended to be the specific 'f' introduced at the beginning of the article.

Right, it's general, and it's kinda confusing to use f when everything else seems to use X_t or similar. Here f is actually a random variable index by x, so one example could be

  f(x) = r_1 + x * r_2

where r_1 and r_2 are two independent random variables with the standard normal distribution. In this case f(x) represents all possible lines, and f(3) gives you a random variable for index 3, so r_1 + 3 * r_2, that also follows a normal distribution thanks to how normal random variables behave when added and scaled.

> The plots now have y and x, and x1 and x2. How are these related?

The left plot shows three realizations of y = f(x), ie for three different choices (samples) of the random variables that goes into f(x). The right-hand plot shows the output of the kernel function for two indices x and x'. In the first example, the kernel function was the dot product between the two inputs, but given the indices are 1-dimensional that reduces to just k(x, x') = x * x'.

Back to the example, you can feed the machine various amounts of cheese and record the various amounts of gold you get back. The amount of cheese are the indices which you use with the kernel function you picked, which you run through the Gaussian Process regression math, and you get a new function which takes an index (amount of cheese) and returns a normal distribution that predicts the amount of gold for that index (amount of cheese).

The process spits out the mean and the variance of the normal distribution, so you can look at the variance to determine how certain you can be about the prediction which will be centered around the mean.

As I understand it, the point of the left plot is that you can use it to get an idea for which kernel function to use for your measured data. And as mentioned you can easily make new kernel functions by adding (OR-like) and multiplying (AND-like) other kernel functions.

Also the author made a mistake, he mentioned kernel functions are parameterless, but he meant non-parametric. The kernel functions he shows like the periodic kernel has hyperparameters l and p for example.

At least that's my current understanding.

1 comments

alok-g 2 days ago

Thanks. I think I have a better understanding now, though still not having a complete grasp.

>> f(x) = r_1 + x r_2*

>> The right-hand plot shows the output of the kernel function for two indices x and x'.

The kernel function here would be k(x, x') = Cov[f(x), f(x')] = Cov[f(r_1 + x * r_2), f(r_1 + x' * r_2)].

In this case, I am guessing we should be able to figure what k(x, x') would be, but perhaps would not be x * x' for this case. x * x' sounds to be a very special case.

link

magicalhippo 2 days ago

> k(x, x') = Cov[f(x), f(x')] = Cov[f(r_1 + x * r_2), f(r_1 + x' * r_2)].

As I understand it, it would instead be

  k(x, x') = Cov[f(x), f(x')] = Cov[r_1 + x * r_2, r_1 + x' * r_2]

I admit I haven't run through the full math. Given the definition of covariance I see how you get the x * x' term, but you're right in that it's not immediately obvious the other parts cancel fully.

link

magicalhippo 2 days ago

So working the math a bit, it seems clear the author implicitly assumes the random variables follows a standard normal distribution, so zero mean (E[r] = 0) and unit variance (Var(r) = 1). In that case, you end up with a lot of E[...] = 0 and Var(...) = 1 terms and are left with the x*x' cross term.

link

alok-g 2 days ago

And then in the general case, the answer I guess would be a additive superposition of multiple functions including x.x' ... Hence, x.x' serves for purposes of explanation that the original author is aiming for.

link

alok-g 2 days ago

Thanks for the correction; silly miss on my part.

link