Hacker News new | ask | show | jobs
by navjordj 1160 days ago
The original T5 models were trained with masked language modelling similar to BERT and afterwards also trained on a autoregressive task similar to GPT.