| HN Mirror

I do not see which is the relationship between Ozaki algorithms and algorithms that are supposedly "matrix free".

The Ozaki scheme and its variants improves the precision of matrix-matrix multiplications, allowing a matrix-matrix multiplication done with operations having lower-precision to approach the precision of the same multiplication done with operations with higher precision.

So it is an improvement for matrix-matrix operations, which are better done in matrix units. It is not any kind of "matrix free" algorithm.

The Ozaki scheme is not good enough for emulating FP64 in a GPU with poor FP64 throughput, but good FP32 throughput. The reason is that not only the greater precision of FP64 is important, but also its much greater dynamic range in comparison with FP32. In computations with FP64, overflows and underflows are extremely rare events and easy to avoid. On the other hand, in complex physical simulations it is impossible to avoid overflows and underflows in FP32, unless one uses extremely cumbersome frequent rescalings, which eliminate all the advantages of using floating-point numbers instead of fixed-point numbers.

I do not know to which kind of "matrix free" algorithms for FEA you are referring .

Nevertheless, the problem of any "matrix free" algorithm is exactly its poor scaling, because any "matrix free" algorithm must do similar amounts of computational operations and memory transfers. This limits the performance to that of the memory, which prevents scaling.

The advantage of the algorithms based on matrices is exactly the better scaling, because only such algorithms can do more computational operations than memory transfers, so their scaling is no longer limited by the memory interface.

For implementing matrix-matrix operations, the matrix units introduced initially by NVIDIA and then by AMD, Apple, Intel and since next year also by Arm, are preferable, because they reduce even more the number of memory transfers that prevent scaling, in comparison with implementing the same matrix-matrix operations in vector units.