So assign random values to connection weights and then ‘spin’ those weights to a combination of other random values that hopefully perform a bit more favourably.. isn’t this just random search?
It's not a random search through the parameter space:
"But how do we select a good network from these Kn different networks? Brute-force evaluation of
all possible configurations is clearly not feasible due to the massive number of different hypotheses.
Instead, we present an algorithm, shown in Figure 1, that iteratively searches the best combination
of connection values for the entire network by optimizing the given loss. To do this, the method
learns a real-valued quality score for each weight option. These scores are used to select the weight
value of each connection during the forward pass. The scores are then updated in the backward pass
based on the loss value in order to improve training performance over iterations."
Random search is a technical term in optimization with a very specific meaning (which unfortunately does not mean searching random locations in parameter space a la brute force). It’s more in the spirit of randomly deciding the direction in which to try to take the next step, thereby implicitly deriving a gradient component by sampling.
It reminds me of Bayesian model sampling, where you have a distribution over possible weights and 'draw' a model from the distribution for each evaluation... A problem is that there may be interesting co-dependencies amongst the weights which independent sampling will have a hard time getting right.
"But how do we select a good network from these Kn different networks? Brute-force evaluation of all possible configurations is clearly not feasible due to the massive number of different hypotheses. Instead, we present an algorithm, shown in Figure 1, that iteratively searches the best combination of connection values for the entire network by optimizing the given loss. To do this, the method learns a real-valued quality score for each weight option. These scores are used to select the weight value of each connection during the forward pass. The scores are then updated in the backward pass based on the loss value in order to improve training performance over iterations."
It's actually pretty clever.