It's not that simple. Backpropagating a bipartite graph of nodes works out to a series of matrix operations that parallelize efficiently on a GPU as long as the matrices fit into the GPU's working memory. Running a GA (part of neat) doesn't normally work well on a GPU. The good NEAT algorithms even allow different neurons to have different firing response curves. This inherently defies the "same operation multiple values" style of parallelization in GPUs. The way GPUs work just fundamentally isn't well suited to speeding up NEAT.
You don't. You need a different parallelism model than a GPU provides. It could work well on machines with very high CPU count, but the speedup on GPUs is the main reason bipartite graph algorithms have seen such investment.