Hacker News new | ask | show | jobs
by mattpharr 1491 days ago
Rejection sampling is not a good choice here.

First, the cost is 2 FMAs * the cost of generating 2 random numbers * the number of rejection sampling iterations. On average, you reject (4-pi)/4 ~= 0.25 of the time. However, if you're running on a GPU in a warp (or the equivalent) of 32 threads, then you pay the cost of the maximum number of rejections over all the threads.

The bigger issue is that a direct mapping from [0,1]^2 to the disk, as is described in this tweet, if you have well-distributed uniform samples in [0,1]^2, you get well-distributed samples on the disk. Thus, stratified or low discrepancy samples in [0,1]^2 end up being well distributed on the disk. In turn, this generally gives benefits in terms of error with Monte Carlo integration of functions over the disk.

1 comments

You can have the threads cooperate, distributing from those that got extra to those that were unlucky.