|
|
|
|
|
by malwrar
1032 days ago
|
|
Curious, what led you to adjusting the parameters this way? Also, have you guys experimented with ALiBi[1] which claims better extrapolative results than rotary positional encoding? [1]: https://arxiv.org/abs/2108.12409 (charts on page two if you’re skimming) |
|