Hacker News new | ask | show | jobs
by acdha 1891 days ago
Possibly but this is the kind of boilerplate which people tend to ignore, especially when a program is non-trivial. It’s really easy to notice if you’re doing something like `seed_rng(); fork();` but once there’s distance and more than one thing being passed around I’d be surprised if you didn’t find the same pattern, perhaps a bit less common.

Fundamentally, there two problems: fork() is a performance trick to try to do setup only once and seeding an RNG is a type of setup which isn’t intuitively obvious can’t be optimized that way; and if most people learn from a tutorial or quick start this is exactly the kind of important but non core issue people omit or ignore in that context.

1 comments

Additionally, I think people make a hidden assumption that they don't even realize they're making: that when you ask for random numbers from numpy, they're more or less "true" random numbers, not seeded ones. Like, I think the intention of the programmers is just "give me a bunch of random numbers, I don't really care how as long as they're random", and assumes that that is what that numpy function does. But it doesn't: it provides you a pseudo-random sequence – not true randomness – so of course the sequence is identical after the fork.

Like, they think they're reading from /dev/random, but they're not: they're just running rand() (metaphorically speaking).

Definitely - back when I supported a computational neuroscience group that came up multiple times (not numpy but similar contexts), along with the various quirks around floating point math. Even experienced people do things like that because they’re focused on the actual problem and this is a leaky implementation detail.