Hacker News new | ask | show | jobs
by svara 197 days ago
... but not in deep learning or am I missing something important here?
1 comments

Yes, absolutely in deep learning. Custom fused CUDA kernels everywhere.
Yep. MoE, FlashAttention, or sparse retrieval architectures for example.