Hacker News new | ask | show | jobs
by totalperspectiv 369 days ago
In the coarse graining code, you use an @parameter-for. Doesn’t that lead to some pretty large code size unrolling that? Or is that less of an issue on GPU?

Great write up! I learned a lot!

1 comments

It doesn’t. The batch size is just 8. This is a very good trick and often needed to archive peak performance in memory bound kernels. You can checkout the equivalent code in cuda aswell :)