|
|
|
|
|
by adrian_b
596 days ago
|
|
Unlike the cache line size that is known, you cannot predict how the DRAM address bits are mapped to CPU or GPU addresses and which is the page size of the DIMMs that happen to be used. Matching the DRAM characteristics influences much less the performance of a linear algebra algorithm than matching the cache line size and the sizes of the L1 and L2 cache memories. If you really want to tune an algorithm for the maximum performance that can be achieved on a particular computer, you must use an automatic tuning method, which runs benchmarks of that algorithm varying all matrix layout parameters until finding optimum values. An example of how this is done is provided by the ATLAS library that implements the BLAS API. Such a library tuned for a given computer should be then used only for that computer. |
|