|
|
|
|
|
by KeplerBoy
828 days ago
|
|
No optimization flags could be a big part of the reason. Haven't looked closely at the code or tried it, but with -O3, -fopenmp and a well-placed pragma the performance could increase many-fold. Heck, with NVC++ you could offload that thing to a GPU with minimal effort and have it flying at the memory bandwidth limit. |
|