Hacker News new | ask | show | jobs
by ffast-math 1458 days ago
There's definitely a tradeoff between speed and accuracy. We characterize this for various problems in the paper (https://arxiv.org/pdf/2106.10860.pdf), but tl;dr is that it speeds things up more at a given level of error when there's more redundancy in your matrices.

Back-of-the-envelope calculation suggests that this won't beat tensor cores on NVIDIA GPUs. This is basically because ~half the die is an ASIC for dense (and 2:4 sparse) matmuls, with no support for the sparsity structure we induce. If 1:16 sparsity were supported or there were a batched warp_shuffle instruction, we'd get similar speedups for GPUs as we do on CPUs.