Y
Hacker News
new
|
ask
|
show
|
jobs
by
twobitshifter
859 days ago
see the long range performance piece here
https://arxiv.org/pdf/2401.04088.pdf
1 comments
Johnyjohnson123
858 days ago
I don't think they explain it in the paper, do they? They just mention the result it seems. I am really curious to know too. Maybe they imply it's the result of their Mixture Of Experts architecture? Or maybe they just don't wanna say, idk
link