| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by qeternity 244 days ago
	PyTorch is only part of it. There is still a huge amount of CUDA that isn’t just wrapped by PyTorch and isn’t easily portable.

1 comments

... but not in deep learning or am I missing something important here?

Yes, absolutely in deep learning. Custom fused CUDA kernels everywhere.

Yep. MoE, FlashAttention, or sparse retrieval architectures for example.