Hacker News new | ask | show | jobs
by keithgabryelski 4122 days ago
it will, in general, instantiate N-n/N records (even distribution, right - N=total records, n=sample size) that is still too much data to fetch to the client for any large data set.
1 comments

You don't have to instantiate the records; that's just the way I did it here. For example, you could do two passes: one to randomly select IDs, and one to fetch a small number of records for the sample set. That's O(2N), but still better than what most databases will do for an ORDER BY RANDOM() query.

There's also another, less-known variant of this algorithm that I'll go into in a future post that alleviates the concern.