Hacker News new | ask | show | jobs
by chestervonwinch 4412 days ago
Mostly, no. Hidden units introduce non-convexity to the cost. How bout a simple counter-example?

Take a simple classifier network with one input, one hidden unit and one output and no biases. To make things even simpler, tie the two weights, i.e. make the first weight equal to the second. Now, mathematically the output of the network can be written: z=f(w * f(w * x)) where f() is the sigmoid.

Next, consider a dataset with two items: [(x_1, y_1), (x_2, y_2)] where x_i is the input and y_i is the class label, 0 or 1. Take as values: [(0.9, 1), (0.1,0)]. The cost function (loglikelihood in this case) is:

L(w) = sum_i { y_i * log( f(w * f(w * x_i)) ) + (1-y_i) * log( 1-f(w * f(w * x_i)) ) }

or

L(w) = log( f(w * f(w * 0.9)) ) + log( 1-f(w * f(w * 0.1)) )

Plot that last guy replacing f with the sigmoid, and you'll see the result is non-convex - there's a kink near zero.