Hacker News new | ask | show | jobs
by air7 3192 days ago
> Parameters like number of filters, filter sizes, architecture of the network etc. have all been fixed before Step 1 and do not change during training process – only the values of the filter matrix and connection weights get updated.

Is this just the article's over-simplification or are these values really just randomly selected?

1 comments

These are called hyper parameters (filters, filter sizes, stride length, pooling function, activation function, and a whole host of others not mentioned in this article). They are chosen "randomly" in the sense that it isn't an exact science, ie there is no "right" answer. However, intuition and experience are used as a guide to select reasonable values.

The values in the filter matrices and the weights and biases of the fully connected layers are truly random though. They are often initialized with Gaussian random values. Sometimes they are just initialized as all 1's, or 0's. Again, there's no "right" answer (there is probably research out there that recommends one initialization approach over another). These are the values that are trained using gradient descent.