| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by espadrine 1131 days ago
	Thanks a lot! I always felt weird about positional embeddings, because positions are not a set, they’re a continuum. My initial guess for why they don’t extrapolate was that the extrapolated embeddings step on the others’ turf once a few computations or layers are applied, causing the model to be confused about order, as if random concepts were inserted here and there. (Position overfit seems like it would weigh in though indeed.) Have you experimented with nonlinear biases?