|
|
|
|
|
by posterboy
3047 days ago
|
|
Your comment reminded me of my self, so maybe I read a bit to much into it. Even given googles resources, I wouldn't be able to "solve" chess any time soon. And it's just a fair guess that this applies to most people, maybe slightly fewer percent here, though, so I took the opportunity to provoke informed answers correcting my assumptions. I did then search papers, so your link is appreciated, but it's all lost on me. > they used Bayesian optimization. Although that's better than brute force, AFAIK its sample complexity is still exponential in the number of parameters. I guess the trick is to cull the search tree by making the right moves forcing the opponents hand? |
|
Hyperparameters are things like the number of layers in a model, which activation functions to use, the learning rate, the strength of momentum and so on. They control the structure of the model and the training process.
This is in contrast to "ordinary" parameters which describe e.g. how strongly neuron #23 in layer #2 is activated in response to the activation of neuron #57 in layer #1. The important difference between those parameters and hyperparameters is that the influence of the latter on the final model quality is hard to determine, since you need to run the complete training process before you know it.
To specifically address your chess example, there are actually three different optimization problems involved. The first is the choice of move to make in a given chess game to win in the end. That's what the neural network is supposed to solve.
But then you have a second problem, which is to choose the right parameters for the neural network to be good at its task. To find these parameters, most neural network models are trained with some variation of gradient descent.
And then you have the third problem of choosing the correct hyperparameters for gradient descent to work well. Some choices will just make the training process take a little longer, and others will cause it to fail completely, e.g. by getting "stuck" with bad parameters. The best ways we know to choose hyperparameters are still a combination of rules of thumb and systematic exploration of possibilities.