Hacker News new | ask | show | jobs
by nick3443 677 days ago
This isn't really the challenge (loss function) that language models are trained on. It's not a simple next-word challenge, they get more context, see how BERT was trained as a reference.