| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by natch 2184 days ago
	Very nice interactive tutorial tool. I wish some of the terms were defined. Hyperparameter? Convolution? Kernel?

3 comments

mjburgess 2184 days ago

Model: the learnt relationship (eg., f(x; a,b) = ax + b)

Parameter: an aspect of the model, a dial which is fixed by data (eg., a)

Kernel (as used here): a subset of such parameters

Algorithm: procedure which accepts data and produces a model

Hyperparameter: an aspect of the algorithm, a dial which changes model production

Convolution: A convolution of image A and Filter B describes to what degree A is "like" B. Here "Filter B" is a kernel, ie., a parameter set learnt by the network.

The goal of a CNN is to produce a model whose parameters are image filters that describe the degree to which an images expresses various shapes. By learning the filters from an image set, the network is specialized to distinguish images in that set.

link

f6v 2184 days ago

> Hyperparameter: an aspect of the algorithm, a dial which changes model production

This seems a bit cryptic. The way I understand hyperparamters, they define how a model learns, i.e. you can set an alpha in gradient descent. Now when you compare them to "ordinary" parameters, hyperparamters do not define relationship between data and output.

link

natch 2184 days ago

These definitions are too vague by half. In a word, useless. “An aspect of the algorithm, a dial” so the same as a parameter then, according to your definition... the only distinction is it changes model production, but in my experience data does that too, so... no clarity here. You make them sound the same to the naive reader.

And you misunderstood my suggestion for the article as a request for your help. But thanks. I don’t doubt what you wrote is accurate and helpful in the same way that saying “a transom is a part of a building” is accurate and helpful.

link

mjburgess 2182 days ago

I had intended the term 'dial' to indicate that it is set by the practitioner. 'Data' is not.

Yes, both data and hyper-parameters are inputs to the algorithm.

I wasn't trying to offer anything more than a sketch of the terms for someone already semi-informed.

To "define" terms in a way that a person without any experience of the area could understand would require quite a long article.

My goal wasn't to answer you specifically but to take your observation as establishing a plausible interest in others for something like my comment.

link

ironSkillet 2184 days ago

I agree with your overall sentiment on the "definitions" offered but just fyi your tone is probably considered to be unnecessarily harsh by many.

link

natch 2183 days ago

Thanks for the call out, I’ll reflect on that.

link

ww520 2183 days ago

The main parameter in ML is θ (theta), as in Y = θ0 + θ1 X1 + θ2 X2 + ..., which are learned using the training data. (X1, X2, ...) are the features. The main goal of ML is to determine these theta parameters to a model so that you can use them to predict result on new data.

Hyperparameters in ML is the tuning parameters on the shape and structure of the model, such as the number of features in linear regression above, the number of layers in a NN or number of neurons in each layer. I think basically any tuning parameters besides theta can be considered hyperparameters. The difference is the theta parameters are learned while the hyperparameters are decided by human. But you can also run experiments on different tuning parameters and compare the outcomes so in a sense hyperparameters can be learned.

Convolution, well, the article is trying to explain it. It's like rolling up a portion of an image using a filter. E.g. Making an image blur by pixelizing it. The main purpose is the find out high level feature of the image. E.g. Put a filter on to find the edge of an object in the image.

Kernel is a small NxN matrix (3x3, 4x4, 16x16, etc) used as filter to convert the pixels in an image to high level feature. E.g. the mean-color-kernel takes 4x4 pixels and computes the average of their colors. Now apply the mean-color-kernel over all the 4x4 blocks of an image and you got one convolution.

link

natch 2180 days ago

Thanks, very helpful.

link

bigfoot675 2183 days ago

I agree, and I'm not sure why people are replying to you trying to help you. Your comment is a constructive critique on the link. By adding a bit more introduction, this tool could be made even more useful.

link