Y
Hacker News
new
|
ask
|
show
|
jobs
by
nick3443
677 days ago
This isn't really the challenge (loss function) that language models are trained on. It's not a simple next-word challenge, they get more context, see how BERT was trained as a reference.