|
|
|
|
|
by negativeonehalf
627 days ago
|
|
See their ISPD 2022 paper, which goes into more detail about the value of pre-training (e.g. Figure 7): https://dl.acm.org/doi/pdf/10.1145/3505170.3511478 Sometimes training from scratch is able to match the results of pre-training, given ~5X more time to converge. Other times, though, it never does as well as a pre-trained model, converging to a worse final result. This isn't too surprising -- the whole point of the method is to be able to learn from experience. |
|