|
|
|
|
|
by jg8610
3374 days ago
|
|
So interestingly, SGD has a nice intuitive explanation for why it is better than GD. If you compute the gradient step for all data, you're expending computational power on redundant data. You're going to get to the minimum with fewer data if you make steps as you get useful information. |
|