Hacker News new | ask | show | jobs
by qeternity 200 days ago
Yes, absolutely in deep learning. Custom fused CUDA kernels everywhere.
1 comments

Yep. MoE, FlashAttention, or sparse retrieval architectures for example.