|
One case I benchmarked was Bernstein/Bézier and Lagrange element evaluation. This is: given a degree d triangle or tetrahedron, given some barycentric coordinates, get the physical coordinate and the Jacobian matrix of the mapping. Degree 2, Lagrange: - runtime: 3.6M/s
- Hana: 16.2M/s
- Hardcoded: 37.7M/s Degree 3, Lagrange: 2.6M/s, 6.4M/s, 13.4M/s (same order). "Runtime" here means everything is done using runtime loops, "Hana" using Boost Hana to make loops compile-time and use some constexpr ordering arrays, "hardcoded" is a very Fortran-looking function with all hardcoded indices and operations all unrolled. As you see, using Boost Hana does bring about some improvement, but there is still a factor 2x between that and hardcoded. This is all compiled with Release optimization flags. Technically, the Hana implementation is doing the same operations in the same order as the hardcoded version, all indices known at compile time, which is why I say there must be some runtime overhead to using hana::while. In the case of Bernstein elements, the best solution is to use de Casteljau's recursive algorithm using templates (10x to 15x speedup to runtime recursive depending on degree). But not everything recasts itself nicely as a recursive algorithm, or I didn't find the way for Lagrange anyways. I did enable flto as, from my understanding (looking at call stacks), hana::while creates lambda functions, so perhaps a simple function optimization becomes a cross-unit affair if it calls hana::while. (speculating) Similar results to compute Bernstein coefficients of the Jacobian matrix determinant of a Q2 tetrahedron, factor 5x from "runtime" to "hana" (only difference is for loops become hana::whiles), factor 3x from "hana" to "hardcoded" (the loops are unrolled). So a factor 15x between naive C++ and code generated files. In the case of this function in particular, we have 4 nested loops, it's branching hell where continues are hit very often. |
That would be fairly interesting to look at the actual code you've used, and have a look at the codegen. By a chance, is it viable for you to open-source it? I'd guess it should bear lots of interest for Hana author/s.
What compiler/version did you use? For example, MSVC isn't (at least wasn't) good at always evaluating `constexpr` in compile-time...
> hana::while creates lambda functions, so perhaps a simple function optimization becomes a cross-unit affair if it calls hana::while. (speculating)
Hmm, I'd say it (LTO) shouldn't influence, as these lambdas are already fully visible to a compiler.