Hacker News new | ask | show | jobs
by helloplanets 5 days ago
The part about positional encoding is not correct.

> The intuition: instead of adding position info to each token’s vector, RoPE rotates the vector by an angle that depends on its position

You can't rotate the token's entire vector (or all three vectors, whatever is being implied is unclear). You rotate each token's Query and Key vectors only, so dot product can be used to tell how far apart the tokens are when comparing token 1's Query vector to token 2's Key vector.

Positional embedding should just be explained after explaining the Query, Key and Value vectors. When the article explains those only after that, the reader is building up on a wrong intuition and it gets confusing.

2 comments

Yep, you’re correct. I got to that bit and thought that can’t be right. It’s obviously wrong. If you rotate a semantic vector, you change the semantics of it. You don’t want that.

Makes me wonder if the whole thing is just slop.

Is the rest of the article correct?

Anyone suggest an alternative article?

The repetition of some phrases/statements means it's either poorly edited, machine generated, or both.
Could you restate this another way: I don't follow.