| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by spott 359 days ago
	All the Llamas have done it (well, 2 and 3, and I believe 1, I don't know about 4). I think they have a citation for it, though it might just be the RoPE paper (https://arxiv.org/abs/2104.09864). I'm not actually aware of any model that doesn't do positional embeddings on a per-layer basis (excepting BERT and the original transformer paper, and I haven't read the GPT2 paper in a while, so I'm not sure about that one either).

1 comments

Thanks! I'm not super up to date on all the ML stuff :)