Hacker News new | ask | show | jobs
by T_D_K 3193 days ago
These are called hyper parameters (filters, filter sizes, stride length, pooling function, activation function, and a whole host of others not mentioned in this article). They are chosen "randomly" in the sense that it isn't an exact science, ie there is no "right" answer. However, intuition and experience are used as a guide to select reasonable values.

The values in the filter matrices and the weights and biases of the fully connected layers are truly random though. They are often initialized with Gaussian random values. Sometimes they are just initialized as all 1's, or 0's. Again, there's no "right" answer (there is probably research out there that recommends one initialization approach over another). These are the values that are trained using gradient descent.