Hacker News new | ask | show | jobs
DroPE: Extending the Context of LLMs by Dropping Their Positional Embeddings (pub.sakana.ai)
5 points by hardmaru 152 days ago
1 comments

> While the original motivation for causal masking was not to provide positional information, but instead to have efficient parallelizable training, it turns out that a consistent <bos> token + causal masking is enough to perfectly reconstruct token positions.

I wish this point was explained further instead of being just a footnote. It seems like the central insight that is essential for this technique to work, and it is not obvious to me, maybe because I haven't implemented a transformer from scratch.