Hacker News new | ask | show | jobs
by hughes 3919 days ago
I guess it works for a few specific cases you mentioned, but the example animation at the end diverges after twelve steps[1]

[1] http://imgur.com/a/gcUCH

1 comments

It does seem like it wasn't trained in the most optimal fashion.

- The conv layer has only 20 relu units, I'm not sure if that suffices to memorize the rules. The net might be "underfitting" the rule set, and therefore making mistakes in rare pixel arrangements.

- More worryingly it's also strange that the author uses a fully connected layer right after the conv layer (this is implicitly added in ConvNetJS when you specify a loss layer). This means that the output neurons are a function of the entire preceding CONV layer activations everywhere, while the game of life rules are local. The way to do this would be to instead use a 1x1 CONV layer with 2 neurons on top of the first 3x3 CONV layer, and interpret it as computing the class scores at every spatial position. However, in this case you'd want to apply the loss on every spatial position, and this "fully-convolutional" loss use case is not supported out of the box in ConvNetJS, but could be written.

- However, with a fully-convolutional loss zero-padding of 1 used around the borders might cause trouble. Normally this is okay with images, but here this might cause trouble because the neurons all share parameters spatially (and hence compute the same function) and don't "know" if they are at the border on in the middle of the image. I'm not sure how how this game of life handles boundary conditions, but if borders obey different dynamics then you'd want to distinguish the border pixels with a special "border" feature vector at the input. E.g. each pixel is a 3-vector, with a 1-hot encoding for (border, positive pixel, negative pixel).

- And it's also strange that the author uses "regression" loss for some that is a binary classification problem.

So, nice attempt but several funny choices, and clear why it didn't fully work :)

Glad to see a critical eye on these sorts of things. I had a question about the choice of loss function. So regression is definitely the wrong choice since our outputs are binary (0 or 1) and not real-valued (0-1).

We could use a softmax classifier as this is meant for multi-class binary classifications. Each class in this case would represent a neighboring pixel, a classification of that class would represent activating that pixel. However, softmax assumes one-label and the probabilities add up to 1. Our problem is a multi-label multi-class binary classification.

We could train 9 of such softmax node groupings for each neighboring pixel but that immediately seems to be a bad idea.

Another awful solution - make each possible pixel configuration (2^9 of these) a class and do a standard soft-max.

I can start to see why the author chose to use regression loss as a sort of hack to get this to work, but I'm trying to think out the best, proper solution.

Any thoughts?

Comments like this, and Hinton's Coursera course remind me that there's a whole "Black Art" to training these systems.

Do you think a similar approach could be used to learn to generalize Navier-Stokes by looking at fluid flows?

It's not any more of a black art than programming is. There's no magic to it, you just have to know what you're doing. Also, you should use some kind of scheme (like bayesian optimization) to optimize the hyperparameters of many experimental models. That's the best way to make sure you're getting the best results (but of course takes a while, so you do have to be selective about which parameters you optimize or what combinations you allow).

And I'll also aknowledge that "programming" is a vastly more mature field than "deep learning" or even "machine learning". So it's fair to argue that there's much we don't know, but there's more and more we do.