I didn't work on it long enough to be able to draw any conclusions, but I can speculate.
I had the gradients going through the soft life approximation (i.e. it was part of the model), rather than simply training a normal cnn with life boards as the inputs and outputs. But I think the approximation may not have good enough gradient signals.
Could it be a case of [1] (but on a non grand scale) :P? I can list my sources of inspiration: [2] [3] [4].
I also tried training convolutional networks, using the soft life set-up, but failed to get them to converge.
[1] https://en.wikipedia.org/wiki/Multiple_discovery
[2] https://kevingal.com/blog/mona-lisa-gol.html
[3] https://arxiv.org/abs/1910.00935
[4] https://nicholasrui.com/2017/12/18/convolutions-and-the-game...