|
|
|
|
|
by Dorialexander
930 days ago
|
|
Yes. I think we may have enough for "full finetuning" and erasing to a large extent the previous knowledge. But that's still very far off for pretraining. "RomeGPT" is next on my list of Monad successors and to give you a general idea, we have on the order of tens of millions of words in classical Latin (and biggest source will… Augustine). There was a BERT Latin project that was able to collect roughly 500 million words in all with mostly early modern and modern Latin. In comparison I'm currently part of a project to pretrain a French model and we need… 140 billion words. |
|