|
|
|
|
|
by jvz01
1761 days ago
|
|
Little side-note: algorithm as given is scalar; however, its branch-free, and defined entirely in the header file. So, compilers will typically be able to vectorize it, and thus achieve speed up directly based on the vector size. I see potential [but architecture-dependent] optimization using Estrin scheme for evaluating the polynomial. |
|