| I think I would call it naive algorithm rather than greedy. It looked like an interesting problem so I spent some time this morning exploring if there would be any performance improvement by pregenerating an array of X items (where X around 1M to 16M items) and then randomly returning one of them at a time. I explored the project and copied the functions to be as faithful as the original implementation as possible. Generating 10M unit sphere (best of 3 runs, g++ 13, linux, Intel i7-8565U, one core for tests): - naive/rejection: ~574ms
- analytical: ~1122ms
- pregen 1M elements: ~96ms
That's almost 6x faster than rejection method. Setup of the 1M elements is done once and does not count on the metrics. Using double type, using float yields around 4x improvements.After looking at those results I decided to try on the project itself, so I downloaded, compiled and applied similar optimizations in the project, only updating circle and sphere random generators (with 16M unit vectors that are only created once on app lifetime) but got almost no noticeable benefits (marginal at most). Hard to tell because of the random nature of the raytracing implementation. On the bright side the image quality was on par. Honestly I was afraid this method would generate poor visuals. Just for the record, I'm talking about something as simple as: std::vector<Vec3> g_unitSpherePregen;
uint32_t g_unitSpherePregenIndex = 0;
void setupUnitSpherePregen(uint32_t nElements) {
g_unitSpherePregen.resize(nElements);
for (auto i = 0; i < nElements; i++) {
g_unitSpherePregen[i] = unitSphereNaive(); // call the original naive or analytical method
}
}
Vec3 unitSpherePregen() {
g_unitSpherePregenIndex = (g_unitSpherePregenIndex + 1) % g_unitSpherePregen.size();
return g_unitSpherePregen[g_unitSpherePregenIndex];
}
I tried as well using a psrng (std::mt19937 and xorshf96) in unitSpherePregen instead of the incremented variable, but increment was faster and yielded good visual results.Next step would be profiling, but I don't think I will invest more time on this. Edit: fix formatting |
The idea is based on the Ziggurat Method. You overlap the circle with n boxes that each encapsulate the same amount of area of the underlying circle, select a random box, and then do rejection.
With 128 boxes, this reduces the average number of additional iterations from 27% to 0.7%, which should massively reduce the number of branch miss predictions.
It ended up about 2x faster the simple rejection method.
I haven't applied this to spheres yet, but that should also be possible.