|
|
|
|
|
by leod
2607 days ago
|
|
Thank you so much for your comprehensive answer, this helps a lot. If I understand nshepperd's code correctly, it uses a constant and small learning rate. Do you know if this works better than the learning rate schedule that is usually used for Transformer models (https://www.tensorflow.org/alpha/tutorials/text/transformer_...)? |
|