| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by wintom 3589 days ago

In the post you mentioned that

>>"In those tasks training from scratch with this model architecture does not do as well as some other techniques we're researching, but it serves as a baseline."

Can you elaborate a little on that? Is the training the problem or is the model just not good at longer texts?