Hacker News new | ask | show | jobs
by wokwokwok 1897 days ago
I suppose...

1) This is an issue from 2018 (https://github.com/pytorch/pytorch/issues/5059), which links to the closed numpy issue (https://github.com/numpy/numpy/issues/9248) which just says: seed your random numbers folk.

2) The documentation in pytorch covers this (https://pytorch.org/docs/stable/data.html#randomness-in-mult...), but it's not really highlighted specifically in, eg. tutorials. (but it is in the FAQ https://pytorch.org/docs/stable/notes/faq.html#dataloader-wo...)

3) It doesn't affect windows, which uses spawn instead of fork.

4) To quote the author:

> I downloaded and analysed over a hundred thousand repositories from GitHub that import PyTorch. I kept projects that use NumPy’s random number generator with multi-process data loading. Out of these, over 95% of the repositories are plagued by this problem.

^ No actual stats, just some vague hand waving; this just seems like nonsense.

So, I suppose... there's some truth to it being a documentation issue, but I guess the title + (1-3) kind of say to me: OP thought they discovered something significant... turns out, they didn't.

Oh well, spin it into some page views.

2 comments

>No actual stats, just some vague hand waving; this just seems like nonsense.

i had exactly the same thought - if they'd actually crawled github they'd have some nice plots to back up the claim.

Better title? Over 95% of GitHub repos using NumPy and PyTorch aren't getting the random numbers they think they are.
Probably over the HN Title character limit.
95% of GitHub repos using NumPy/PyTorch don't get the randomness they intended.