| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Scene_Cast2 358 days ago
	Should be in the RoPE paper. The OG transformers used multiplicative sinusoidal embeddings, while RoPE does a pairwise rotation. There's also NoPE, I think SmolLM3 "uses NoPE" (aka doesn't use any positional stuff) every fourth layer.