Hacker News new | ask | show | jobs
by jebarker 241 days ago
This is a great write up and I’d love to see more like it. Debugging this sort of thing in the megatron->pytorch->CUDA stack is what my team spends more than half of their time on as an ML research team.
1 comments

Wouldn't the Nsight Systems suite provide coverage here? Are the tricky cases difficult to debug with the standard CUDA tooling stack?
Yes, nsys is very helpful, especially when looking at perf issues. It’s often the case that bugs present like in this blog though - you just notice that training curves have regressed somehow - so even with good tooling it can be hard to figure out where to start looking in these very complex systems. Only gets worse if the symptoms only show up when running for a long time and at scale in a cluster.