| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by brrrrrm 590 days ago
	WebGPU cannot even come close unfortunately since they don't have support for hardware specific memory or warp-level primitives (like TMA or tensorcores). it's not like it gets 80% of perf, it gets < 30% of the peak perf for anything related to heavy compute matrix multiplications

3 comments

Const-me 590 days ago

> don't have support for hardware specific memory

I have no experience with WebGPU but if you mean group shared memory, I think the support is available. See the demo: https://compute.toys/view/25

link

zanussbaum 590 days ago

i tried using workgroup shared memory and found it slower than just recomputing everything in each thread although i may have been doing something dumb

i'm excited to try subgroups though: https://developer.chrome.com/blog/new-in-webgpu-128#experime...

link

kayvr 590 days ago

I've heard the WebGPU workgroup wants to close the gap on tensor core support.

link

zanussbaum 590 days ago

you're definitely right, 80% was a bit of an overestimation, especially with respect to CUDA

it would be cool to see if there's some way to get better access to those lower-level primitives but would be surprised

it does seem like subgroup support are a step in the right direction though!

link