| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by wokwokwok 1897 days ago

I suppose...

1) This is an issue from 2018 (https://github.com/pytorch/pytorch/issues/5059), which links to the closed numpy issue (https://github.com/numpy/numpy/issues/9248) which just says: seed your random numbers folk.

2) The documentation in pytorch covers this (https://pytorch.org/docs/stable/data.html#randomness-in-mult...), but it's not really highlighted specifically in, eg. tutorials. (but it is in the FAQ https://pytorch.org/docs/stable/notes/faq.html#dataloader-wo...)

3) It doesn't affect windows, which uses spawn instead of fork.

4) To quote the author:

> I downloaded and analysed over a hundred thousand repositories from GitHub that import PyTorch. I kept projects that use NumPy’s random number generator with multi-process data loading. Out of these, over 95% of the repositories are plagued by this problem.

^ No actual stats, just some vague hand waving; this just seems like nonsense.

So, I suppose... there's some truth to it being a documentation issue, but I guess the title + (1-3) kind of say to me: OP thought they discovered something significant... turns out, they didn't.

Oh well, spin it into some page views.

2 comments

anon_tor_12345 1897 days ago

>No actual stats, just some vague hand waving; this just seems like nonsense.

i had exactly the same thought - if they'd actually crawled github they'd have some nice plots to back up the claim.

link

unityByFreedom 1897 days ago

Better title? Over 95% of GitHub repos using NumPy and PyTorch aren't getting the random numbers they think they are.

link

sdfhbdf 1897 days ago

Probably over the HN Title character limit.

link

unityByFreedom 1897 days ago

95% of GitHub repos using NumPy/PyTorch don't get the randomness they intended.

link