| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by helloplanets 5 days ago

The part about positional encoding is not correct.

> The intuition: instead of adding position info to each token’s vector, RoPE rotates the vector by an angle that depends on its position

You can't rotate the token's entire vector (or all three vectors, whatever is being implied is unclear). You rotate each token's Query and Key vectors only, so dot product can be used to tell how far apart the tokens are when comparing token 1's Query vector to token 2's Key vector.

Positional embedding should just be explained after explaining the Query, Key and Value vectors. When the article explains those only after that, the reader is building up on a wrong intuition and it gets confusing.

2 comments

rhubarbtree 5 days ago

Yep, you’re correct. I got to that bit and thought that can’t be right. It’s obviously wrong. If you rotate a semantic vector, you change the semantics of it. You don’t want that.

Makes me wonder if the whole thing is just slop.

Is the rest of the article correct?

Anyone suggest an alternative article?

link

aaroninsf 4 days ago

The repetition of some phrases/statements means it's either poorly edited, machine generated, or both.

link

giardini 5 days ago

Could you restate this another way: I don't follow.

link