ah ok then you're better off learning OpenCL, if you don't have access to a beefy graphics card just rent one by the hour off of amazon. You can prototype on your laptops CPU then run the real simulation on a remote machine.
OpenCL vs CUDA is pretty boring debate, both run on the same hardware and so have similar performance. Difference is in the tooling and ecosystems, you can run OpenCL on FPGA's for example.
Out of curiosity, has anyone successfully deployed some OpenCL code across very different platforms?
It seems neat that it'll compile and run on a GPU, CPU or FPGA, but it seems like code written for one style of architecture would be appallingly slow on the others.