|
|
|
|
|
by espadrine
3724 days ago
|
|
Just so I understand correctly: your network has 100000 iterations, while the parent's has 1000, but they both only use x / y positions? It feels like neurons in the first layer are weaker, because all they can do is a linear separation. Given deep networks, I was wondering if adding neurons to the first layer was better than adding them to the last one, and empirically, it feels like it is quite worse. I wonder if there is a theorem around that. |
|
Correct, but keep in mind that their method appears to use batch descent while mine does not. Batch descent is often converges more quickly. There are other differences between my net and the GP's I can spot as well (e.g., the activation function, the learning rate, and regularization).
Also keep in mind that I threw this together over breakfast, and did not spend much time tweaking parameters :)