Hacker News new | ask | show | jobs
by brrrrrm 1072 days ago
driving multiple GPUs on the same node is better handled by threads. Python is forced to use multiprocessing
1 comments

Yeah, you're right - even though CUDA is async, doing any preprocessing (in Python) can be harder if you don't have shared memory (the start-up latency hit of multiprocessing is not a problem in this context). I've only ever encountered "embarrassingly parallel" data-feeding problems, where the memory overhead of multiprocessing was small, but I could see other situations. Comment retracted.