Hacker News new | ask | show | jobs
by trisiak 2381 days ago
The distribution of numbers is not uniform across the 0..googol here. The implementation first chooses the length of the number, then fills in the digits. This makes it biased towards smaller numbers.

Does it still have the same optimal strategy as for uniform generation?

3 comments

The distribution chosen here is ~ uniform for the number of digits. (https://github.com/victorqribeiro/googol/blob/master/js/main... )

This is not the same hypothesis as the problem stated in the video. The video talks about the "secretary problem". In this problem you know nothing about the distribution. You just know that the number are ordered.

Here because you know the distribution you can do better by using a strategy which choose to stop depending on the value of the number you have so far. For example if at the 15th pick you get a number with 99 digit, you can safely assume that it is better to stop now (every time you pick you have only a ~1% chance of finding a 99-digit number needed to do better so keeping it means you win (99/100)^(50-16) = 70%, whereas the non-exploitable strategy (i.e. optimal strategy for the secretary problem) 50/e=18 will tell you to keep going and you would lose 70% of the times).

For the secretary problem the theoretical way to sample the number is to set it up as a game where the first player plays the game and the second pick the numbers adversely so that the first doesn't win. A good strategy for the second player is to pick a distribution from a big set of varied distributions, and sample the number from this distribution.

Yep. The optimal strategy depends only on the relative ordering of elements, not on their magnitudes or any other property. So the strategy is the same for any continuous distribution.

(With a discrete distribution, the possibility of ties slightly affects things. But in this particular game, ties seem to be very improbable, so they can be ignored.)

It isn't optimal for known distributions. If you're sampling from a normal distribution with known parameters, and the first of 100 samples is 5 standard deviations above the mean, you take it.
>"The implementation first chooses the length of the number, then fills in the digits. This makes it biased towards smaller numbers."

Why?

Consider a two digit number. With this algorithm, half of the numbers generated will be less than 10, and half will be 11-99.

Likewise with the full 100 digits this algorithm has a 1% chance of generating a number less than 10, while the correct proportion would be 1e-100.

The algorithm also does not generate numbers with zero in them (presumably to avoid leading zeroes, but this will skew results somewhat).

I think the correct algorithm would be to generate 100 digits and then remove the leading zeroes.

Also relevant (and in my mind, the underlying explanation for Benford's Law):

https://en.m.wikipedia.org/wiki/Jeffreys_prior

Just getting into bayesian inference and this sent me down a rabbit hole, thanks :)
Damn...
People, I just ran this simple script on my terminal using Node and I got a pretty uniform result.

  a = {}
  for(let i = 0; i < 100000; i++){
    let n = Math.floor(Math.random() * 10);
    if( n in a )
      a[n] += 1
    else
      a[n] = 1
  }


My result:

{ '0': 9917, '1': 10015, '2': 9957, '3': 10107, '4': 10019, '5': 10037, '6': 10120, '7': 9914, '8': 10042, '9': 9872 }

> real-life sets of numerical data

Today you learned: Node’s random number generator isn’t selecting from a real-life set of numerical data

yeah, I just mixed two informations. I thought the other person said my algorithm would be biased towards small numbers cause I use the JS random function to pick the number of decimal places for the number
Yep, sorry, didn't mean to say this directly applied, just a neat concept that's somewhat related.
Sure but that’s not the algorithm you used in the repo.