Y
Hacker News
new
|
ask
|
show
|
jobs
by
snippyhollow
1023 days ago
We changed RoPE's theta from 10k to 1m and fine-tuned with 16k tokens long sequences.
1 comments
malwrar
1023 days ago
Curious, what led you to adjusting the parameters this way? Also, have you guys experimented with ALiBi[1] which claims better extrapolative results than rotary positional encoding?
[1]:
https://arxiv.org/abs/2108.12409
(charts on page two if you’re skimming)
link
ttul
1023 days ago
Undoubtedly, they have tried ALiBi…
link
[1]: https://arxiv.org/abs/2108.12409 (charts on page two if you’re skimming)