|
|
|
|
|
by _delirium
1889 days ago
|
|
NumPy does auto-seed the RNG if you don't pass a seed yourself, using platform-specific code to pull some entropy from the OS. So that common case is handled reasonably well, unlike with C. In fact if you want exactly reproducible results (e.g. in testcases), you have to seed with a known seed, to avoid that default behavior. The issue here is a little more subtle: if you fork 10 copies of your Python process, all 10 inherit the current RNG state, and will thereafter produce identical random number sequences. If you were manually forking, you might guess that was a potential problem, and re-seed the RNGs after forking. But PyTorch's data loaders fork a bunch of processes to do things in parallel, so users might not realize that they're using duplicate copies of their RNG state. |
|
Python multiprocessing doesn’t use fork on Windows. It starts a new process and so shouldn’t be affected by this.
So to trigger this you need to have num_processes != 0 on your DataLoader and be running on a non-Windows platform.