| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by diwank 822 days ago
	Seconded. I’m guessing you could create an implementation that is able to do that and then write optimised triton/cuda kernels to accelerate them but need to investigate further