| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by rsp1984 1084 days ago

This looks quite amazing at first glance but looking a bit deeper some elephant-in-the-room questions pop up:

1. Simple Gradient Descent is already linear in time w.r.t. # of parameters (but it is an iterative method). It seems to be missing from Table I too. This method requires waiting for equilibration, so could it be seen as another form of iterative method? If so, wouldn't the proper comparison be against known O(n) first-order iterative methods like GD as opposed to exact methods (O(n^3)) or pseudo-2nd-order iterative methods like CG (O(n^2))?

2. In 2.B.3 the paper says "Note that this includes a compilation step that scales as O(d2)". I think this needs some clarification. Is this saying that, in order to run this on actual hardware, there's a compilation step involved that is O(n^2)? Of course O notation says nothing about linear factors but wouldn't that contradict what's stated in Table I?

1 comments

aifer4 1084 days ago

You're right, that iterative methods may be linear in time with respect to the number of parameters. In this paper, we provide a method which is linear in time with respect to the dimension (d) of the vector (x) we are solving for in the equation A x = b. The number of parameters is d^2 + d, as there are d^2 parameters for the matrix A and d for the vector b. Gradient descent would require a matrix-vector multiplication just to compute the gradient, which is already O(d^2).

You make an important point, regarding the compilation. In this case, we are talking about the time it takes to upload the matrix A and vector b to the hardware. This requires O(d^2) numbers to be updated, but assuming it is done in parallel it could be done in O(d) time, and the coefficient of this scaling is independent of the physical parameters of the analog hardware. For this reason, in the analysis of the algorithm, we are generally ignoring the time to update the parameters, as is clarified in the Methods section.

rsp1984 1084 days ago

OK, got it. I think what you're describing is Gradient Descent on the Normal Equations to solve an overdetermined linear system. Indeed in such a system dim(x) == dim(b) == d. Matrix A is fixed though and not part of the estimation but you're correct about the complexity of gradient computation which is indeed O(d^2).

Thanks for the clarification of the uploading / compilation step.