|
|
|
|
|
by danielhanchen
809 days ago
|
|
My main current concerns are I tried asking for a transformer benchmark to see if this worked on transformers, but didn't get any response. Also they seem particularly focused on CNN type benchmarks, but did not bother to benchmark superconvergence + Ranger21 + the learning rate range finder, since they explicitiy said Schedule-Free needs tuning as well. Their past research on D-Adpatation (won ICML best paper 2023) and their follow up work Prodigy all did worse / similar than AdamW, so maybe this works on CNNs, but does not on transformers - but for CNNs we have superconvergence. I shall wait for their paper which will come in 1-2 months. |
|