Hacker News new | ask | show | jobs
by MasterScrat 859 days ago
And how does Mistral do "accurate long content retrieval"?
1 comments

see the long range performance piece here https://arxiv.org/pdf/2401.04088.pdf
I don't think they explain it in the paper, do they? They just mention the result it seems. I am really curious to know too. Maybe they imply it's the result of their Mixture Of Experts architecture? Or maybe they just don't wanna say, idk