Hacker News new | ask | show | jobs
by yongjik 2013 days ago
> with dynamically generated graphs, the computational graph is never actually defined anywhere: the computation is traced out on the fly and behind the scene. You can no longer do anything interesting with the computational graph: for example, if the computation is slow, you can’t reason about what parts of the graph are slow.

Hmm, my experience is the opposite. When I used Tensorflow, there was no way I could figure out why something is slow, or require huge memory. All I have is a gigantic black box.

Meanwhile, in PyTorch, all I have to do is run it with CUDA_LAUNCH_BLOCKING=1, and it will give me an accurate picture of exactly how much milliseconds each line is taking! (Just print the current time before/after the line.) With nvprof it will even tell you which CUDA kernels are executing.

* Disclaimer: Haven't dabbled in ML for ~a year, so my view might be outdated now.

1 comments

Eh. I love pytorch, but it can definitely be difficult to reason about at times. For instance, due to async dispatch on GPU, you could get assertion errors where a line fails, but the real error was actually several lines above.

That was difficult to reason about.

Wouldnt this be fixed by CUDA_LAUNCH_BLOCKING=1? Or putting a bunch of torch.cuda.synchronizes in the suspected lines.
lol whoops yeah that would definitely solve the problem. I wasn't familiar with `CUDA_LAUNCH_BLOCKING` but `torch.cuda.synchronizes` does work.