Hacker News new | ask | show | jobs
by desku 1922 days ago
Here's a paper on how BERT (a large Transformer model trained using self-supervised learning) implicitly learns the traditional NLP pipeline: https://arxiv.org/abs/1905.05950