Hacker News new | ask | show | jobs
by FabHK 1756 days ago
So, do I understand this correctly:

0. The problem you're really looking at is quickly computing

  a'B
for many different vectors a (of length D) which are conceptually random.

1. To do that, you approximate AB, where A is very long and thin (NxD), and B (DxM). (Note: this problem has complexity O(NDM) naively, or, if N=D=M, O(N^3), and there are smarter algorithms that reduce the exponent 3 down to 2.8 (Strassen) or even further down, though those are not practical from what I gather. Oh, and note that at minimum, you're looking at time (N+M)D, because you need to look at every element, unless you approximate).

2. You approximate AB by f(g(A),h(B)), such that the error (in "relative Frobenius norm", ie norm of the error divided by norm of the actual product) is "small" with "high probability".

3. You're given B and a "typical" A^~ (ie training data), and then have two steps:

3.1. Do some training mumbo jumbo (fast?), and

3.2. now finally you can quickly compute the approximate AB (or a'B) for "new" A (or a'), presumably fast?

Quite cool that there are new techniques (and apparently fast, the article claims 100x faster than exact, and 10x faster than existing approximations) for such a classic problem!

But, if I understand correctly, it could happen with a certain probability that the actual error is quite high? Maybe if your a' is untypical?

ETA: From what I understand, in many ML applications people use 32bit or even 16bit floats, so high accuracy is not as important as speed anyway?

1 comments

Yes, basically correct. A couple notes/clarifications for other readers:

- The rows a of A are "random," but in the sense of being drawn from some distribution for which we have a training set--not in the sense of, e.g., "assume everything is gaussian" or some other idealized case.

- The training takes a few seconds to maybe a couple minutes, depending on your training set size. But it does have to be done ahead of time--you can't train after receiving the matrices and still get a speedup. Unless maybe the matrices are unrealistically large.

- The error in the approximation is provably not much larger than the error on the training set with high probability (under standard ML assumptions--finite variance, training and test distributions match), but there's no hard limit on it in general. If you know the range of values in your individual vectors, you could obtain a hard limit, although it would be much looser than what you see in practice.

- Correct that many ML applications tolerate large losses in precision. But caveat is that we won't get the right answer to a full 16 bits in each element (this would be ~99.999% accuracy), but instead more like 90-99.9%, depending on the speed-quality tradeoff chosen. So we can't claim it will be sufficiently accurate in all cases