|
|
|
|
|
by Majromax
69 days ago
|
|
Well, the "vector field defined by the update attributable to this training sample" is a well-defined thing (even if it's not just the gradient of loss with respect to parameters), so that part translates. However, what's harder to interpret is how this field transports with respect to θ, since the momentum vector and θ are themselves inextricably linked. If you somehow arrived at a different θ, then you'd have a different momentum. (On the gripping hand, the bracket is a construct of infinitesimals, maybe that doesn't matter.) |
|