|
|
|
|
|
by kmill
1968 days ago
|
|
I ended up writing one in the meantime -- check out the end of the readme. For this workload it seems to be a factor of 30x. The C program gets away with no heap allocation and inlining the entire render loop, though Lean does do quite a lot of unboxed float arithmetic. I tried to optimize the Lean and only shaved 25% off the runtime (code in a branch). I suspect that it could be made a bit faster if each thread had its own copy of the scene. Lean handles reference counting of shared objects differently. |
|