Hacker News new | ask | show | jobs
by oneseven 1098 days ago
It seems like learned positional encodings would still prevent you from doing fine tuning on a larger context size, though, so maybe using alibi is still relevant (although I have not read that paper).
1 comments

You can collapse all positions beyond a length to a specific bucket like T5