|
|
|
|
|
by wintom
3589 days ago
|
|
In the post you mentioned that >>"In those tasks training from scratch with this model architecture does not do as well as some other techniques we're researching, but it serves as a baseline." Can you elaborate a little on that? Is the training the problem or is the model just not good at longer texts? |
|