Hacker News new | ask | show | jobs
by Emore 5438 days ago
As a computer scientist, one wonders what kind of randomness? Is the date picked uniformly at random, or is there a bias involved (for example, a negative bias towards the coming days)?
2 comments

The exact second is picked uniformly at random using an approximation of 30 days per month. The code is:

    my $sendAt = time + int(rand()*86400*30*5) + 86400*30;
You should use a Mersenne Twister rather than rand().

That rand() probably only has 32767 discrete values. So your time is quite imprecise, in addition to being biased. (C's rand() is very biased, so if that code forwards to rand(), then it's biased, not uniform.)

Not that it really matters much in this case.

http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html

http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/CODES...

Interesting, I didn't know Perl's rand() was that terrible. Hopefully, the auto-seeding using srand will add some entropy. Also, since I'm choosing an epoch second, the times should hopefully be uniformly distributed at the scale of days...
Actually, if you seed your random number generator more than once ("re-seed"), then you're completely destroying your entropy. Which is obviously the exact opposite of what you're trying to achieve.
Just curious...why exactly would you say reseeding adds bias? AFAIK, srand in Perl on Linux uses /dev/urandom, which uses at least some bits from the hardware entropy pool.

Swapped out rand() for MT in any case... :)

Wow, that was fast. Nice.

The truth is this: if you ever hear the phrase "re-seeding", you should reinterpret it as "warning! danger!" because if you seed a random number generator more than once, you destroy any entropy ('true' randomness) that generator otherwise might've had.

When you seed a generator, you're saying "give me a queue of uniformly random numbers. Each time I call rand(), pop one from the queue and return it."

If you re-seed, you're saying "throw that queue away; give me a different queue of uniformly random numbers".

If you do that for every rand(), then you no longer have a queue of uniformly-random numbers. You have bias.

One way to think about this is: The resulting output over time is no longer uniformly random, because it's the first random number of every queue of uniformly random numbers (every seed). And the first number of every seed != uniformly random. It wasn't designed to be.

It's both hard to understand and hard for me to explain, sorry. But if you want to know more, feel free to ask more questions and I'll do my best.

tl;dr: if you seed Mersenne Twister (or any other RNG) more than once, you'll be losing most of the benefits of the Twister (from a mathematical point of view). So don't! =)

If I had to design this, I would pick the time according to a power-law curve with closer times being more probable. In fact I would be willing to send ALL my normal emails through such a filter.