|
|
|
|
|
by marcinzm
630 days ago
|
|
This is what the repo says: >Results
>Ariane RISC-V CPU
>View the full details of the Ariane experiment on our details page. With this code we are able to get comparable or better results training from scratch as fine-tuning a pre-trained model. The paper includes a graph showing that it takes longer for Ariane to train without pre-training however the results in the end are the same. |
|
Sometimes training from scratch is able to match the results of pre-training, given ~5X more time to converge. Other times, though, it never does as well as a pre-trained model, converging to a worse final result.
This isn't too surprising -- the whole point of the method is to be able to learn from experience.