Hacker News new | ask | show | jobs
by PaulHoule 1229 days ago
I was thinking RoBERTa 3, longformer or Big Bird would be a good choice for this, though having any limit on the attention window is a weakness.