Hacker News new | ask | show | jobs
by lukan 728 days ago
"Collision detection is usually a tree search"

Yes, because of the very limited numbers of CPU cores. With a GPU you can just assign one core to one particle.

Here is a simple approach to do it with WebGPU:

https://surma.dev/things/webgpu/

It uses the very simple approach, of testing every particle with EVERY other particle. Still very performant (the simulation, the choosen rendering with canvas is very slow)

I currently try to do something like this, but optimised. With the naive approache here and Pixi instead of canvas, I get to 20000 particles 120 fps on an old laptop. I am curious how far I get with an optimized version. But yes, the danger is in calculating and rendering blocking each other. So I have to use the CPU in a smart way, to limit the data being pushed to the GPU. And while I prepare the data on the CPU, the GPU can do the graphic rendering. Like I said, it is way harder to do right this way. When the simulation behaves weird, debugging is pain.

2 comments

If you use WebGPU, for your acceleration structure, try to use the algorithm here presented in the Diligent Engine repo. This will allow you not to transfer data back and forth between CPU and GPU: https://github.com/DiligentGraphics/DiligentSamples/tree/mas...

Another reason I did it on CPU was because with WebGL you lack certain things like atomics and groupshared memory, which you now have with WGPU. For the Diligent Engine spatial hashing, atomics is required. I'm mainly using WebGL because of compatibility. iOS Safari still doesn't enable WGPU without special feature flags that user has to enable.

Thanks a lot, that is very interesting! I will check it out in detail.

But currently I will likely proceed with my approach where I do transfer data back and forth between CPU and GPU, so I can make use of the CPU to do all kinds of things. But my initial idea was also to keep it all on the GPU, I will see what works best.

And yes, I also would not recommend WebGPU currently for anything that needs to deploy soon to a wide audience. My project is intended as a long term experiment, so I can live with the limitations for now.

This is a 2D simulation with only self-collisions, and not collisions against external geometry. The author suggests a simulation time of 16ms for 14000 particles. State of the art physics engines can do several times more, on the CPU, in 3D, while colliding with complex geometry with hundreds of thousands of triangles. I understand this code is not optimized, but I'd say the workload is not really comparable enough to talk about the benefits of CPU vs GPU for this task.

The O(n^2) approach, I fear, cannot really scale to much beyond this number, and as soon as you introduce optimizations that make it less than O(n^2), you've introduced tree search or spatial caching that makes your single "core" (WG) per particle diverge.

"that make it less than O(n^2), you've introduced tree search or spatial caching that makes your single "core" (WG) per particle diverge"

Well, like I said, I try to use the CPU side to help with all that. So every particle on the GPU checks maybe the 20 particles around it for collision (and other reactions) and not 14000, like it is currently.

That should give a different result.

Once done with this sideproject, I will post my results here. Maybe you are right and it will not work out, but I think a found a working compromise.