|
|
|
|
|
by gregable
407 days ago
|
|
Very well put together. If you are curious about the weighted version, I tried to explain it some here: https://gregable.com/2007/10/reservoir-sampling.html There's also a distributed version, easy with a map reduce. Or the very simple algorithm: generate a random paired for each item in the stream and keep the top N ordered by that random. |
|
I discuss these issues more here: https://blog.moertel.com/posts/2024-08-23-sampling-with-sql....