| I suppose... 1) This is an issue from 2018 (https://github.com/pytorch/pytorch/issues/5059), which links to the closed numpy issue (https://github.com/numpy/numpy/issues/9248) which just says: seed your random numbers folk. 2) The documentation in pytorch covers this (https://pytorch.org/docs/stable/data.html#randomness-in-mult...), but it's not really highlighted specifically in, eg. tutorials. (but it is in the FAQ https://pytorch.org/docs/stable/notes/faq.html#dataloader-wo...) 3) It doesn't affect windows, which uses spawn instead of fork. 4) To quote the author: > I downloaded and analysed over a hundred thousand repositories from GitHub that import PyTorch. I kept projects that use NumPy’s random number generator with multi-process data loading. Out of these, over 95% of the repositories are plagued by this problem. ^ No actual stats, just some vague hand waving; this just seems like nonsense. So, I suppose... there's some truth to it being a documentation issue, but I guess the title + (1-3) kind of say to me: OP thought they discovered something significant... turns out, they didn't. Oh well, spin it into some page views. |
i had exactly the same thought - if they'd actually crawled github they'd have some nice plots to back up the claim.