| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jmward01 5 days ago
	I'd love to se a side by side by side comparison of implementing as triton, cuda/c++, just using torch.compile, etc etc with a few example ops. I have broken out triton a lot for things but found that it is very hit or miss how much I will gain over just using torch.compile. Probably a lot of that is my skills and a lot is how much torch.compile can take together and optimize if raw pytorch is made available to it.