| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by CGamesPlay 3725 days ago
	Neat stuff, fun to play with. I wasn't able to get a net to classify the swiss roll. Last time I was playing around with this stuff I found the single biggest factor in the success was the optimizer used. Is this just using a simple gradient descent? I would like to see a drop down for different optimizers.

3 comments

8note 3725 days ago

http://imgur.com/ypBQEWx

Add some noise, and use all the inputs, and one 8 wide hidden layer

edit: works better with a sigmoid activation curve, but it converges more slowly

link

andrewtbham 3725 days ago

Yeh you're on the right track. Nice pattern emerges on this after 160 iterations.

http://playground.tensorflow.org/#activation=tanh&batchSize=...

link

rmellow 3725 days ago

Using syn, cos, x1, x2 with 1 six-neuron hidden layer does the trick quickly: http://imgur.com/UMv5gsH

No need to mess with noise or regularization :)

link

makeset 3725 days ago

> Add some noise

This actually makes the dataset harder to fit to. It is not the same thing here as the "training with noise" method where random noise would be added to each batch, as an alternative means of Tikhonov regularization.

link

8note 3724 days ago

wih that particular data set, it looks like it really just adds more data, and more importantly, fills in the gaps along the spirals which is where my setup was having troubles.

The noise doesn't go far enough to start confusing points between different clusters, but it adds more points.

That said, my knowledge of neural nets is fairly limited.

link

cglace 3725 days ago

Using all inputs and 6 layers of varying sizes. After about 500 iterations. http://i.imgur.com/x1MOpvl.jpg

link

visarga 3724 days ago

Just 100 iterations, learning rate 0.03, activation tanh, regularization L2, rate 0.01. The network is 8,8,8 neurons per layer.

link

Obi_Juan_Kenobi 3725 days ago

Using the defaults, I had success at about 300 iterations with all the inputs and 5 hidden layers, each with a decreasing number of neurons (i.e. 6,5,4,3,2).

I don't know if that's a general feature to need fewer neurons with each layer, but that seems to work here.

link

chestervonwinch 3725 days ago

What were the optimization algorithms you had most success with? Were they more successful in the sense of better out-of-sample error rate or in the sense of quicker convergence (or something else)?

link