| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sebakubisz 58 days ago
	This is the kind of porting work I always hope for when I see a CUDA-only release. Have you thought about publishing the gather-scatter sparse 3D convolution and SDPA attention swaps as a standalone toolkit or writeup? A lot of folks running models locally on Apple Silicon hit the same wall with flash_attn, nvdiffrast, and custom sparse kernels and end up redoing the same work.

1 comments

that makes so much sense...I am exploring if I can find someone who has done this well...If not I'll try to do it myself.