Hacker News new | ask | show | jobs
by anon_tor_12345 1891 days ago
This is probably because I never read these kinds of blogposts but this is one of the most flagrantly clickbait titles I've ever seen. Like the article doesn't even suggest ditching numpy in favor of jax or some kind of other hot take (which would at least warrant such a bombastic title) it literally just presents one instance in which you might be making a mistake when using numpy's rng (not even something more unique to numpy). And the PyTorch team is aware of this and hence exposes `worker_init_fn`. So the title should actually be "Using fork without understanding fork? You might be making a mistake."
4 comments

I suppose...

1) This is an issue from 2018 (https://github.com/pytorch/pytorch/issues/5059), which links to the closed numpy issue (https://github.com/numpy/numpy/issues/9248) which just says: seed your random numbers folk.

2) The documentation in pytorch covers this (https://pytorch.org/docs/stable/data.html#randomness-in-mult...), but it's not really highlighted specifically in, eg. tutorials. (but it is in the FAQ https://pytorch.org/docs/stable/notes/faq.html#dataloader-wo...)

3) It doesn't affect windows, which uses spawn instead of fork.

4) To quote the author:

> I downloaded and analysed over a hundred thousand repositories from GitHub that import PyTorch. I kept projects that use NumPy’s random number generator with multi-process data loading. Out of these, over 95% of the repositories are plagued by this problem.

^ No actual stats, just some vague hand waving; this just seems like nonsense.

So, I suppose... there's some truth to it being a documentation issue, but I guess the title + (1-3) kind of say to me: OP thought they discovered something significant... turns out, they didn't.

Oh well, spin it into some page views.

>No actual stats, just some vague hand waving; this just seems like nonsense.

i had exactly the same thought - if they'd actually crawled github they'd have some nice plots to back up the claim.

Better title? Over 95% of GitHub repos using NumPy and PyTorch aren't getting the random numbers they think they are.
Probably over the HN Title character limit.
95% of GitHub repos using NumPy/PyTorch don't get the randomness they intended.
OP said they scanned and found this problem in thousands of projects including some ones which are probably copied heavily as examples like from Nvidia. While the post might be a little strong, at least they back up their statement that many others are actually suffering this problem
Maybe the intent is for it to be read as "If you're using pytorch and numpy, it's _very_ likely you're making this mistake", but the effect is still that the headline is clickbait
It's so obviously clickbait that I wonder if it's meant to be tongue-in-cheek.
"I downloaded over a hundred thousand repositories from GitHub that import PyTorch... Out of these, over 95% of the repositories are plagued by this problem."

Title seems pretty accurate to me!