| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by nick3443 677 days ago
	This isn't really the challenge (loss function) that language models are trained on. It's not a simple next-word challenge, they get more context, see how BERT was trained as a reference.