|
|
|
|
|
by credit_guy
1570 days ago
|
|
The random directions don’t have unit length. They are drawn from a multivariate normal. They have equal variance along all directions, including the direction of the gradient. The descent along the orthogonal directions somehow cancels out. The descent along the gradient direction somehow becomes more efficient, I don’t know why yet. |
|