it will, in general, instantiate N-n/N records (even distribution, right - N=total records, n=sample size)
that is still too much data to fetch to the client for any large data set.
You don't have to instantiate the records; that's just the way I did it here. For example, you could do two passes: one to randomly select IDs, and one to fetch a small number of records for the sample set. That's O(2N), but still better than what most databases will do for an ORDER BY RANDOM() query.
There's also another, less-known variant of this algorithm that I'll go into in a future post that alleviates the concern.
There's also another, less-known variant of this algorithm that I'll go into in a future post that alleviates the concern.