| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by seurimas 737 days ago
	Positional encoding is standard for transformers of all stripes. They introduce a seemingly novel, redundant positional encoding scheme. It's more difficult to train, but seems to enable producing multiple tokens at once (i.e. you could get an answer that is N tokens long in N/x steps instead N steps).