| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ofirpress 1131 days ago
	(I wrote ALiBi) Thanks for posting this! You can view a video where I explain what we did and why it's useful at: https://www.youtube.com/watch?v=Pp61ShI9VGc

4 comments

espadrine 1131 days ago

Thanks a lot! I always felt weird about positional embeddings, because positions are not a set, they’re a continuum. My initial guess for why they don’t extrapolate was that the extrapolated embeddings step on the others’ turf once a few computations or layers are applied, causing the model to be confused about order, as if random concepts were inserted here and there. (Position overfit seems like it would weigh in though indeed.)

Have you experimented with nonlinear biases?

link

Eridrus 1130 days ago

Is ALiBi still the sota for this setting, or have there been advances beyond this in the last 8 months? I know there has been a lot of interest in longer context lengths recently.

link

ipsum2 1129 days ago

xpos is SoTA right now: https://arxiv.org/pdf/2212.10554.pdf

link

Eridrus 1128 days ago

Thanks!

link

zuzun 1131 days ago

If I understand it correctly, you are only attending preceding tokens in your paper. Can the constant bias matrix be made symmetric for unmasked tasks?

link

jerpint 1131 days ago

I’m curious as to whether this inductive bias wouldn’t hurt on tasks where the first sentence of a long corpus would contain the most useful information.

Nonetheless, very clever trick and congrats on the great paper!

link