Hacker News new | ask | show | jobs
by TekMol 1878 days ago
Ah yes, you are right. In that sense, it is a random permutation of the numbers from 1 to max.

Yes, when initialized with "size=5" the numbers in the output must be precisely 1,2,3,4,5 but in random looking order.

And I don't want to store the whole sequence. I want to say gimmeTheNumberAtPosition(x) and it return just that number.

7 comments

Generating random permutations is quite a different problem from creating random numbers. The standard algo for random permutations is the Fisher-Yates Shuffle.

https://en.wikipedia.org/wiki/Fisher–Yates_shuffle

ETA: As you don't want to store the permutation, you might want to pick a number randomly from 1 to n!, and then generate the permutation on the fly up to the desired element, using the techniques outlined here:

https://stackoverflow.com/questions/29236556/how-can-i-calcu...

Storing a number of the order of magnitude of n! will take the same amount of memory than storing a permutation of n elements, so what's the point ?
If one really wants to do it the slow and hard way, it's possible to compute a random permutation of n elements one element at a time with n bits of storage (less with fairly generic compression techniques): just remember the set of numbers that have been produced so far and sequentially pick one of the remaining ones. You'll waste some combination of additional memory (for arithmetic decoding state) and random bits (for rejected samples), but materializing a large randomly permuted vector should be slower.
Good point. Storing a number of magnitude n! will take n log n bits, and storing n numbers up to n will also take n log n bit.

In other words, good old Fisher Yates, storing the resulting permutation, and a simple lookup is probably the way to go.

I guess you could predictably enumerate the permutations, so that you could lookup with parameters size/n, position/i, seed/r - where r is the permutation.

For size=1, all parameters are 1, and only result is 1.

For size 2, r can be 1 or 2, naming the permutations [1,2];[2,1] - and eg: rand_at(n=2,r=2,i=2) would return 2, but i=1, would return 2.

I'm not sure how I'd implement this - I suppose it might be possible to generate a predictable "walk" based on n/r/i?

But for sizes less than, say, a million i would think that a shuffled array would be easier?

r would need to be in the range 1..n!

You can construct arbitrarily-sized permutations with log(n) time random access using Sometimes-Recurse Shuffle: https://eprint.iacr.org/2013/560.pdf.
What you want is a cipher from 1..N to 1..N. This problem is often approached as an encryption problem known as Format Preserving Encryption. The Sometimes Recurse Shuffle linked elsewhere in this thread is one solution but the most foundational and accessible paper on this is Black and Rogaway's "Ciphers with Arbitrary Domains" The Generalized Feistel Network (Method 3) is fast and easy to implement.

https://web.cs.ucdavis.edu/~rogaway/papers/subset.pdf

So you want a mapping of [1, n] to a permutation of [1, n] without having to generate the list [1, n] and shuffling it. An affine transformation modular n should work. Instead of [1, n], look at [0, n). Find a value `a` in [0, n) such that gcd(a, n) = 1 and pick a random integer b in [0, n). Then the `random` number at each position is `a * x + b mod n`.

This is simple and fast, but is not secure at all. You can solve for a, b by solving the linear congruence. It also does not generate every permutation of `n`. For n = 5, only 20 sequences can be found out of 5! = 120.

The value `a` has to fulfill some more properties: https://en.wikipedia.org/wiki/Linear_congruential_generator#...

Since the "randomness" of the permutation is not that great, I was looking for something better, but could not find anything. The closest I got was https://en.wikipedia.org/wiki/Xorshift#xoshiro_and_xoroshiro which only works for powers of two. A workaround would be to choose the next larger power of two and reject all random values which are smaller than `n`, but that introduces unpredictable latency and destroys the cool jump-ahead feature.

The LCG is the improved extension which causes you to have to compute values x_0 = seed to x_n in order to calculate x_{n+1}. What I am talking about is just a mapping of index x -> f(x) which is what the OP seemed to have wanted. In that case, `a` only needs to be relatively prime. This cannot account for every permutation of n since the number of values relatively prime to n, the totient, has an upper bound of n, which is the possible values for a. The number of values for b is also n, so at most n^2 possible sequences are generated, which is less than n! for n > 3.

For example, with n = 5: Let a = 3 and b = 2. x = [0, 1, 2, 3, 4], a * x = [0, 3, 6, 9, 12], a * x + b = [2, 5, 8, 11, 14], a * x + b mod n = [2, 0, 3, 1, 4]

If your value of size is a power of two you could just encrypt x with a block cipher where the block size is equal to size and the key is your seed.
for JS you can probably just do `[...Array(n).keys()].sort(()=>Math.random() - 0.5)` ?
Having an unstable comparator might wreak havoc and maybe not give a proper distributed space of possible results?

Edit: yes, here is an example (in the comments) of how biased this is: https://stackoverflow.com/a/18650169/923847

That is exactly the algorithm that Microsoft used to bias its purportedly random “browser choice” screen. Don’t use it.

https://www.robweir.com/blog/2010/02/microsoft-random-browse...

The Fisher–Yates shuffle is the right way to shuffle an array in an unbiased way.

https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle