Hacker News new | ask | show | jobs
by Freire_Herval 1156 days ago
You need to move these sections into the GPU. Neither zig, rust, C++ nor C can do that alone. You'll need to use cuda or futhark. That's the real speedup for these types of things.

Optimization in this case largely concerns memory management in the GPU and keeping data transfer between cpu to gpu at a minimum.

Essentially a massively parallel API combined with a massively parallel processor. I'm thinking of doing an end to end tutorial about this.