Hacker News new | ask | show | jobs
by VMG 4993 days ago
I've seen a detailed explanation of a very sophisticated implementation of GOL in java (using caching, cycle detection and similar things), but I can't find it right now. Damn.

Edit: found it http://www.ibiblio.org/lifepatterns/lifeapplet.html

I tend to think of cellular automata optimization as being related to data compression. This is also a simple concept with no simple solution, and what solutions are best depends on the type of data being processed. In Conway's Life, patterns tend to be blobby.

For blobby universes, one should probably consider dividing the universe up into blocks approximately the size of the blobs. For Life, 4x4 to 8x8 seem reasonable. I chose the upper bound, 8x8, for reasons of convenience: There happen to be 8 bits in a byte. I strongly considered 4x4, but it didn't work out as nice....

You should put the blocks in some kind of list, so that you waste zero time in the empty parts of the universe.

Already, note a complication: New elements in the list must be introduced if the pattern grows over a block's boundaries, but we have to know if the block's neighbor already exists. You can either do a simple linear search of the list, or binary search, or keep some kind of map. I chose to make a hash table. This is solely used for finding the neighbors of a new block; each existing block already keeps a pointer to its neighbors, as they will be referenced often.

There must also be an efficient algorithm within the blocks. I chose to primarily blaze straight thru each block. There are no inner loops until all cells in a block are processed. Also, fast-lookup tables are employed. I look up 4x4 blocks to determine the inner 2x2.

Note: CA programs typically consist of 2 main loops (plus a display loop), because CA rules operate on the cells in parallel, while the microprocessor is conceptually serial. This means that there must be two copies of the universe, effectively, so that no important info is destroyed in the process of creating the next generation. Often these 2 copies are not symmetrical. It was a great struggle for me, since almost every time I took something out of one loop to make it faster, I had to add something else to the other loop! Almost every time, that is; the exceptions to that rule lead to the best optimizations. In particular, there are good tradeoffs to be considered in bit-manipulations: shifting, masking, recombining to form an address in the lookup table....

It can also be considered that sometimes the contents of a block may stabilize, requiring no further processing. You could take the block out of the list, putting it in a "hibernation" state, only to be re-activated if a neighboring block has some activity spilling into it. These blocks would take zero processing time, just like a blank region of the universe.

Period-2 oscillators might also not be very difficult to detect, and remove from the processing time. This might be worthwhile in Life, because the blinker is the most common kind of random debris. Higher period oscillators are much more rare. It is also possible that gliders could be detected and simulated. You will get diminishing returns from this kind of optimization, unless you take it to an extreme (cf. HashLife).

Also, a block of cells that's completely empty might not be worth deallocating and removing from the hash table for a while. That takes some processing time, which could be significant in the case of an oscillator moving in and out of its space repeatedly. Only when memory gets low should the oldest blocks from the "morgue" be recycled.

When the program is fast enough, it should be considered that it isn't worth displaying generations any faster than the eye can see, or at least not much faster than the refresh rate of the monitor. Especially in windowed environments, display time can be a real bottleneck.