| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by wizzwizz4 323 days ago
	Transformer models are excellent at translation, but next-token prediction is not the correct architecture for it. You want something more like seq2seq. Next token prediction cares more about local consistency (i.e., going off on a tangent with a self-consistent but totally fabricated "translation") than faithfulness.