| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by anon_tor_12345 1938 days ago
	This is probably because I never read these kinds of blogposts but this is one of the most flagrantly clickbait titles I've ever seen. Like the article doesn't even suggest ditching numpy in favor of jax or some kind of other hot take (which would at least warrant such a bombastic title) it literally just presents one instance in which you might be making a mistake when using numpy's rng (not even something more unique to numpy). And the PyTorch team is aware of this and hence exposes `worker_init_fn`. So the title should actually be "Using fork without understanding fork? You might be making a mistake."

4 comments

wokwokwok 1938 days ago

I suppose...

1) This is an issue from 2018 (https://github.com/pytorch/pytorch/issues/5059), which links to the closed numpy issue (https://github.com/numpy/numpy/issues/9248) which just says: seed your random numbers folk.

2) The documentation in pytorch covers this (https://pytorch.org/docs/stable/data.html#randomness-in-mult...), but it's not really highlighted specifically in, eg. tutorials. (but it is in the FAQ https://pytorch.org/docs/stable/notes/faq.html#dataloader-wo...)

3) It doesn't affect windows, which uses spawn instead of fork.

4) To quote the author:

> I downloaded and analysed over a hundred thousand repositories from GitHub that import PyTorch. I kept projects that use NumPy’s random number generator with multi-process data loading. Out of these, over 95% of the repositories are plagued by this problem.

^ No actual stats, just some vague hand waving; this just seems like nonsense.

So, I suppose... there's some truth to it being a documentation issue, but I guess the title + (1-3) kind of say to me: OP thought they discovered something significant... turns out, they didn't.

Oh well, spin it into some page views.

link

anon_tor_12345 1938 days ago

>No actual stats, just some vague hand waving; this just seems like nonsense.

i had exactly the same thought - if they'd actually crawled github they'd have some nice plots to back up the claim.

link

unityByFreedom 1938 days ago

Better title? Over 95% of GitHub repos using NumPy and PyTorch aren't getting the random numbers they think they are.

link

sdfhbdf 1938 days ago

Probably over the HN Title character limit.

link

unityByFreedom 1937 days ago

95% of GitHub repos using NumPy/PyTorch don't get the randomness they intended.

link

gleenn 1938 days ago

OP said they scanned and found this problem in thousands of projects including some ones which are probably copied heavily as examples like from Nvidia. While the post might be a little strong, at least they back up their statement that many others are actually suffering this problem

link

coolreader18 1938 days ago

Maybe the intent is for it to be read as "If you're using pytorch and numpy, it's _very_ likely you're making this mistake", but the effect is still that the headline is clickbait

link

nerdponx 1937 days ago

It's so obviously clickbait that I wonder if it's meant to be tongue-in-cheek.

link

TruthWillHurt 1937 days ago

"I downloaded over a hundred thousand repositories from GitHub that import PyTorch... Out of these, over 95% of the repositories are plagued by this problem."

Title seems pretty accurate to me!

link