Hopefully this helps (correct me if I'm wrong, I'm still learning about neural nets):
Think of the whole neural net as a function:
input * weight = output
At each iteration, we feed in the input to the neural net. Then the neural net compares what output it gets to the correct output.
For example, input1 is 5, and the correct output for input1 should have been 2. But the neural net got 3 as the output. So it then decreases the weights slightly so it would get 2.75 next time it has input of 5. Repeat thousands of times. That's the basic idea for machine learning and neural networks.
The algorithm it uses to figure out how much to decrease the weights is called "backpropagation" which uses gradient descent. To explain gradient descent, as as a roller coaster track. Imagine the roller coaster starts off on a random location on the track. Then gravity takes the roller coaster down the track until it ends up on a low point between two hills and stays there. This is the new location of the roller coaster. This new location is nice because it has the lowest energy the roller coaster could find, so it stays there. (We use derivatives to figure out the slope of a curve, which then gives us the direction where the curve goes downhill).
In neural networks, the roller coaster curve is the "cost function", which basically calculates the amount of difference between the neural net's output and the actual correct output it should have got. The initial weight is the roller coaster's initial position. The new weight is the roller coaster's final position, at the bottom of the cost function curve. This new position thus gives us the lowest cost.
Note that there may be even lower valleys, but when we roll the rollercoaster it stops at its nearest low valley. This is why we randomize the weights at the beginning - to put the roller coaster near possibly even lower valleys.
Okay, so it works by minimizing (equiv. maximizing) some function. But that doesn't say much about how it "learns" the gradient. What function does it care about? Average squared error (predict_prob-Z_i)^2 ? Average absolute error? The likelihood function of some assumed distribution? Maximum distance between the classification border and closest observed points? If I saw someone carrying a bag full of blueberries and some bread home from the grocery store and asked to know how they chose to buy that, to which they replied "I had a list of characteristics which I thought where important for groceries to have in this trip to the store. For each grocery item, I recorded a vector of degrees to which the item possesses each of those characteristics. Finally, I chose the group of groceries that had the best combination of degree vectors", I still wouldn't really know anything about why they bought the blueberries and bread.
The function it minimizes is called the "loss function", and its value for the training and test sets are shown in the upper right area. AFAICT the site doesn't say how it's computed, but I think it's average squared error. The gradient is not learned; if you think of the loss function as a real-valued function of the weights, the gradient is just the partial derivatives with respect to the weights.
It's training a neural network to classify a data set with two classes (orange or blue) and the data has two features (x1 or x2). All the orange and blue dots are the training data. So if you take a dot on the graph with coordinates (-2, 4) and it's blue, that would mean that a data point with x1 = -2 and x2 = 4 has the class blue.
You can think of a neural network as a function that can take in arbitrary features (in this case x1 and x2) and tries to output the correct class. That's what the orange and blue colors in the background are, the neural network's guess at the correct classification for any given point (x1, x2).
When you hit play, it iterates through the training data making adjustments to each neuron in the network so that it gets closer to predicting the right class.
If you want to see how well the neural network performs on data it wasn't trained on, you can click "show test data".
Yeah I feel like we need some decent understanding of neural networks to have more context on this. Its kind of like being given a specialized shovel but not knowing why you need it or why you should dig holes.
I think it's a playground in the best sense of the term. Take some time and actually play with it, and a lot of fun stuff happens, lightbulbs go off, etc.
If you're expecting a lesson, you'll likely be disappointed, but I think there's real value in a true playground.
I think the biggest improvement would be if, when hovering over a 'neuron', you get a visual representation of what feeds into it.
For me in Chrome on OSX you do get a visual representation of the neuron's input when hovering. It shows up behind the data points in place of the neurons' output when hovering.
It begins training the network using the backpropagation algorithm.
> Next, the network is asked to solve a problem, which it attempts to do over and over, each time strengthening the connections that lead to success and diminishing those that lead to failure.
On each iteration, it calculates how bad the predicted output is, then adjusts the weights between neurons to lessen that value. Google backpropagation for more info
Or perhaps explain how all the different inputs influence the result? I more or less get that it's just iterating over the data to approximate the given data set when you press play but I have no idea how giving it more or less neurons changes that, to name an example.
Basically each input gets multiplied by some weight that gets adjusted through each iteration. The product of the input and weight gets put through an activation function, and the outcome of that can be interpreted as the network's prediction of the class.
So you see the first neuron's input is just x1. You can see in the little graph at x1 that it's split down the middle with orange on one side and blue on the other. You can think of adjusting the weight on that neuron as adjusting where along the x axis the split occurs. All points on the orange side are classified orange and all on the blue side are classified blue. If you picked a data set like the spiral one or whatever, that neuron alone isn't going to make very many correct classifications. That's because it only gets the x1 value as input and can only affect the output by multiplying x1 by some weight, which would only have the affect of shifting the classification boundary left or right. You can see the same thing happening for the second neuron with input x2 except that now it splits along the y axis. Again that alone isn't going to match the data very well.
But then you get to the second layer, and the input of each neuron in the second layer is the output of each neuron in the first layer. So these neurons are able to take into consideration both x1 and x2 and are able to divide the data in more complex ways. So you can think of the neurons in each layer of the neural network as being able to consider more and more complex properties of the data in forming its output.
The chart at the right is the output/result of the neural network's training. In the foreground you see actual data points that are used to train the neural net: to "teach" it how to classify orange or blue (unless you choose "regression" in which case it computes a numeric value). In the background you see the gradient that is formed by the network. The goal is to make the gradient form around the data points by color as closely as possible.
The neural network is essentially the nodes in the middle, linked together by various weights. During training, the test data points are fed forward into the network, creating an output. That output is then fed backward using something called "back propagation" which is used to adjust the weights.
Typically, the more hidden layers or nodes per layer, the more difficult gradients that can be learned. Zero hidden layers essentially forms a linear gradient that can only be used to split very basic, linearly-separable data (drawing a straight line to separate the different types)
Neural networks have lots of little knobs and levers you can adjust. That's what all these inputs are that you see.
Think of the whole neural net as a function:
input * weight = output
At each iteration, we feed in the input to the neural net. Then the neural net compares what output it gets to the correct output.
For example, input1 is 5, and the correct output for input1 should have been 2. But the neural net got 3 as the output. So it then decreases the weights slightly so it would get 2.75 next time it has input of 5. Repeat thousands of times. That's the basic idea for machine learning and neural networks.
The algorithm it uses to figure out how much to decrease the weights is called "backpropagation" which uses gradient descent. To explain gradient descent, as as a roller coaster track. Imagine the roller coaster starts off on a random location on the track. Then gravity takes the roller coaster down the track until it ends up on a low point between two hills and stays there. This is the new location of the roller coaster. This new location is nice because it has the lowest energy the roller coaster could find, so it stays there. (We use derivatives to figure out the slope of a curve, which then gives us the direction where the curve goes downhill).
In neural networks, the roller coaster curve is the "cost function", which basically calculates the amount of difference between the neural net's output and the actual correct output it should have got. The initial weight is the roller coaster's initial position. The new weight is the roller coaster's final position, at the bottom of the cost function curve. This new position thus gives us the lowest cost.
Note that there may be even lower valleys, but when we roll the rollercoaster it stops at its nearest low valley. This is why we randomize the weights at the beginning - to put the roller coaster near possibly even lower valleys.