Hacker News new | ask | show | jobs
by blamestross 2176 days ago
(See sibling comment NEAT is awesome)

The only reason we architect ANNs the way we do is optimization of computation. The bipartite graph structure is optimized for GPU matrix math. Systems like NEAT have not been used at scale because they are a lot more expensive to train and to utilize the trained network with. ASICs and FPGAs have a change to utilize a NEAT generated network in production, but we still don't have a computer well suited to training a NEAT network.

2 comments

So this might be an enormous opportunity for low-cost and more performant AI if someone was able to build an FPGA of some sort that could handle these types of computations as efficiently right?
Running the post-training network is a solved problem. (FPGA and ASIC can do it just fine). TRAINING the network is the difficulty. The problem is that the structure of the network is arbitrary and is a result of the learning process. You can't optimize a computation for a structure you don't know yet. Bipartite layer networks have the benefit of never changing structure but they can approximate other subset structures. I don't know if we could easily tell where we are on the tradeoff between "bipartite graphs are trained efficiently but are inefficiently simulating a smaller network in practice"
NEAT just doesn't have good, modern GPU powered implementations.

NEAT would totally be competitive if someone actually gets a version running in PyTorch/Tensorflow

It's not that simple. Backpropagating a bipartite graph of nodes works out to a series of matrix operations that parallelize efficiently on a GPU as long as the matrices fit into the GPU's working memory. Running a GA (part of neat) doesn't normally work well on a GPU. The good NEAT algorithms even allow different neurons to have different firing response curves. This inherently defies the "same operation multiple values" style of parallelization in GPUs. The way GPUs work just fundamentally isn't well suited to speeding up NEAT.
You may be interested in this implementation [1] which builds the networks using PyTorch.

[1] https://github.com/uber-research/PyTorch-NEAT

It uses pytorch (and I'm probably going to use it), but doesn't effectively leverage a GPU for training.
What do you think is the best way to accomplish this?
You don't. You need a different parallelism model than a GPU provides. It could work well on machines with very high CPU count, but the speedup on GPUs is the main reason bipartite graph algorithms have seen such investment.