Hacker News new | ask | show | jobs
by KeplerBoy 828 days ago
No optimization flags could be a big part of the reason.

Haven't looked closely at the code or tried it, but with -O3, -fopenmp and a well-placed pragma the performance could increase many-fold.

Heck, with NVC++ you could offload that thing to a GPU with minimal effort and have it flying at the memory bandwidth limit.