| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by refibrillator 260 days ago

Great exposition, loved the touch of humor. Please do the backward pass when it’s published.

As a fellow Tri Dao groupie and lucky duck who gets to build on Hopper/Blackwell clusters, I find it amazing how difficult it is becoming to write kernels that saturate GPU hardware.

When I squint, there appears to be a trend emerging across work like FA4, monolithic (mega) kernels, etc. Namely, a subversion of the classic CUDA programming model in the form of fine grained task based parallelism, managed entirely in “user space”.

Not exactly sure what’s ahead but I’m strapping in for a wild ride…

2 comments

kweezar 260 days ago

Any great learning resources for beginners friendly GPU programming?

link

arthurcolle 259 days ago

Modal's CUDA Book is cool

link

charles_irl 260 days ago

Thanks! I think computers are fun and I want reading about them to be fun too.

I was also reminded of HazyResearch's MegaKernels. Didn't want to distract from the main thrust of the post, but definitely think that's a promising approach.

link

emaadm 259 days ago

There's some interesting work in NeurIPS this year on fused kernels for MoE too: https://flash-moe.github.io/

link