I'd love to see a blog post detailing some of the optimizations you did to speed this up. Presumably you octree/voxel segmentation to re-use already summed regions of the CA?
For the most simple MNCA implementations, I don't use any special optimizations. It's essentially just brute-force naive convolution.
That said, some of the more advanced models I've made use up to 48 neighborhoods.
For those, I split the neighborhoods into 'Width-1' rings for each 'Radius' value, and then assemble the final neighborhoods by referencing some combination of those rings.
For the 'RGB-SMNCA' models with a maximum Radius of 10, this means that I can reduce the total texelFetch() calls from a maximum of 163348 = 16,704 (!!!) down to 13348 = 1044.
These numbers come from: ( Neighborhoods * Color Channels * Radius-10 Neighbor Count )
3D (S)MNCAs are something I certainly plan on developing, and will likely implement after I'm finished with the headless_engine branch of VulkanAutomata, which will hopefully (finally) allow for a proper cross-platform compatible application.
I'd love to see a blog post detailing some of the optimizations you did to speed this up. Presumably you octree/voxel segmentation to re-use already summed regions of the CA?
I'd also love to see some 3D MNCAs...