|
|
|
|
|
by jey
530 days ago
|
|
Generally yes, vanilla gradient descent gets plenty of use. But for LLMs: no, it’s not really used, and stochastic gradient descent provides a form of regularization, so it probably works better in addition to being more practical. |
|