Hacker News new | ask | show | jobs
by tmoertel 461 days ago
For the particular case of the exponential distribution we can go further. By taking advantage of the theory of Poisson processes, we can take samples using a parallel algorithm. It even has a surprisingly succinct SQL translation:

    SELECT *
    FROM Population
    WHERE weight > 0
    ORDER BY -LN(1.0 - RANDOM()) / weight
    LIMIT 100  -- Sample size.
Notice our exponentially distributed random variable on prominent display in the ORDER BY clause.

If you're curious, I explore this algorithm and the theory behind it in https://blog.moertel.com/posts/2024-08-23-sampling-with-sql....

1 comments

Quite off-topic, but do you know when you'll write the article about CPS, if ever?
Oops. I had quite forgotten that I need to write about that. I said I would over a decade ago, so that's a long time for you to wait. Sorry about that.

I mainly write for myself, so I need the time and the motivation. Until recently, my job at G took up my time and also provided an internal community where I could scratch the writing itch, which reduced the motivation for public writing on my blog. But now that I'm semi-retired, I'll try to write more frequently.

Thanks for the accountability!