|
|
|
|
|
by ffast-math
1749 days ago
|
|
Primary author here. Happy to answer questions! Also, feel free to email me at the address in the paper if you're interested in talking about it in more detail. E.g., I've already heard from some hardware folks looking at expanding on this work. |
|
0. The problem you're really looking at is quickly computing
for many different vectors a (of length D) which are conceptually random.1. To do that, you approximate AB, where A is very long and thin (NxD), and B (DxM). (Note: this problem has complexity O(NDM) naively, or, if N=D=M, O(N^3), and there are smarter algorithms that reduce the exponent 3 down to 2.8 (Strassen) or even further down, though those are not practical from what I gather. Oh, and note that at minimum, you're looking at time (N+M)D, because you need to look at every element, unless you approximate).
2. You approximate AB by f(g(A),h(B)), such that the error (in "relative Frobenius norm", ie norm of the error divided by norm of the actual product) is "small" with "high probability".
3. You're given B and a "typical" A^~ (ie training data), and then have two steps:
3.1. Do some training mumbo jumbo (fast?), and
3.2. now finally you can quickly compute the approximate AB (or a'B) for "new" A (or a'), presumably fast?
Quite cool that there are new techniques (and apparently fast, the article claims 100x faster than exact, and 10x faster than existing approximations) for such a classic problem!
But, if I understand correctly, it could happen with a certain probability that the actual error is quite high? Maybe if your a' is untypical?
ETA: From what I understand, in many ML applications people use 32bit or even 16bit floats, so high accuracy is not as important as speed anyway?