Hacker News new | ask | show | jobs
by philkuz 1075 days ago
Caveat buried in the abstract is that this beats BERT and non-pretrained Transformers. Looks like GPT style should still be better, but naturally requires a higher computation cost
1 comments

Gzip every query with all training data can get more expensive.