|
|
|
|
|
by leecarraher
1750 days ago
|
|
For some reason they run all their tests single threaded. Seems like parallelism is where all computing hardware is inevitably going. I also wish they had run time comparisons to more recent matrix sketching and multiplication method such as frequent directions, newer FJLT implementations, and RIP matrices based on hashing. |
|
As far as single threaded, there's a simple answer and a subtle one. The simple answer is that we consider the core subroutine a single thread would run in a multithreaded context, not how to do the multithreading. These are basically orthogonal issues since matrix multiplication is embarrassingly parallel and our method would parallelize just like any other. More details in appendix E2.
The subtler answer is that we could do even better in the multithreaded context if you could fuse the encoding step with whatever earlier code produces the larger matrix. This is a result of matrix products becoming memory-bandwidth-bound in the limit of sufficient cores, combined with our approach reducing the size of the matrix by a huge amount.