Hacker News new | ask | show | jobs
by danbruc 1053 days ago
Does sparse mean anything other than we can not actually do as many FP8 operations per second as we just claimed? To me it sounds like they can do X matrix operations per second on sparse matrices using Y FP8 operations per second, but instead of just saying what Y is they tell us how many FP8 operations would be required if the matrices were not sparse. Is this pure marketing bullshit or is there some logic to this? How sparse do those matrices have to be? Or am I misunderstanding this claim?
1 comments

It means a very specific sparsity pattern - 2:4, so 2 out of 4 values are not 0. It's not pure bullshit, because a matrix with 2:4 sparsity may represent more "information" than a matrix that is 50% smaller.
Okay, yes, there is a bit more information than in a matrix with half the number of entries, namely the position of the zeros. But when it comes to the number of floating point operations, doubling that number seems at least somewhat questionable to me, they are not performing that many multiplications. On the other hand it would probably be hard if not impossible to achieve the same performance if one tried to manually exploit this sparsity and avoid the multiplications, so maybe under that angle it is not too unreasonable.

But this also made me wonder, how does one use this in practice? If the matrices are not tiny, then they will probably have to be incredible sparse in order to always have at least two out of four entries zero. So does this just set some entries to zero if there are not enough of them in each group of four? Does one have to ensure this on its own, reorder rows and columns and introduce zeros where required and acceptable?