|
|
|
|
|
by eiz
1163 days ago
|
|
> 4. Describing positional embeddings as multiplicative. They are generally (and very counterintuitively to me, but nevertheless) additive with token embeddings. Worth noting that rotary position embeddings, used in many recent architectures (LLaMA, GPT-NeoX, ...), are very similar to the original sin/cos position embedding in the transformer paper but using complex multiplication instead of addition |
|