|
|
|
|
|
by jimmyl02
435 days ago
|
|
the large context windows generally involve RoPE[0] which is a trick that allows the training window to be smaller but expand larger during inference. it seems like they have a new "iRoPE" which might have better performance? [0]https://arxiv.org/pdf/2104.09864 |
|