| From article: > With two cross-overs, there is little difference between the performance of the OCaml and C++ implementations. Note that the C++ was compiled with CPU-specific optimizations whereas the OCaml binary will run on any 486 compatible. Ok, so the author is suggesting numerical C++ compiler generated SSE2+ code is comparable to OCaml 486, and thus with only very slow stack based x87 FPU available? SSE beats x87 hands down even without vectorization enabled, using a single float/double (scalar) per XMM vector register! That should make a significant difference in any ray-tracer worth 2 cents. I didn't bother reading the source code, but I think it's very likely he didn't write C++ code like you should for a performance sensitive numerical application. Although maybe the code for scene graph, bounding box and hit test was bad? FPU performance starts to matter once you actually find the ray intersecting object... Anyways, usually if you do use C++, you do so because of specific performance requirements. Not to have the shortest or cleanest possible implementation. |