| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by simon_vtr 375 days ago
	The kernels I mention in CUDA use all the equivalent logic like the Mojo kernels. You can find them on my GitHub: https://github.com/simveit/effective_transpose You may want to provide a faster kernel on H100 via PR and I will merge after checking it’s faster.