Thanks for this reference, I will look it up. Though, from my experience people in NLP still (be default) train from scratch, with some exceptions for tasks on the same dataset:
This is true, but rapidly changing. In addition to fine tuneable language models, you can do deep feature extraction with something like bert-as-service [0] ... You can even fine tune Bert on your days, then use the fine tuned model as a feature extractor.
[0] https://github.com/hanxiao/bert-as-service