Hacker News new | ask | show | jobs
by jimmyl02 435 days ago
the large context windows generally involve RoPE[0] which is a trick that allows the training window to be smaller but expand larger during inference. it seems like they have a new "iRoPE" which might have better performance?

[0]https://arxiv.org/pdf/2104.09864