| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by adrian_b 596 days ago

Unlike the cache line size that is known, you cannot predict how the DRAM address bits are mapped to CPU or GPU addresses and which is the page size of the DIMMs that happen to be used.

Matching the DRAM characteristics influences much less the performance of a linear algebra algorithm than matching the cache line size and the sizes of the L1 and L2 cache memories.

If you really want to tune an algorithm for the maximum performance that can be achieved on a particular computer, you must use an automatic tuning method, which runs benchmarks of that algorithm varying all matrix layout parameters until finding optimum values. An example of how this is done is provided by the ATLAS library that implements the BLAS API.

Such a library tuned for a given computer should be then used only for that computer.

1 comments

muziq 594 days ago

Yes.. And no, you can inspect and measure the SDRAM component, at runtime, to best determine how object sizes will be allocated.. Is kind of what I was getting at, and have spent the last month implementing ;)

link