Hacker News new | ask | show | jobs
by rsaxvc 1104 days ago
Ages ago had a mathematician shard image processing across cores.

Started with a pair of loops for each image, foreach col, foreach row.

Problem: C++ 2d arrays are row-col not col-row.

Halfway through multi-threaded performance was much worse than single threaded.

Eventually we switched to row-col processing, single-threaded was fast enough, back to two loops per image.