Y
Hacker News
new
|
ask
|
show
|
jobs
by
paradite
473 days ago
My burning question: Why not also make a slightly larger model (100B) that could perform even better?
Is there some bottleneck there that prevents RL from scaling up performance to larger non-MoE model?
2 comments
t1amat
473 days ago
See QwQ-Max-Preview:
https://qwenlm.github.io/blog/qwq-max-preview/
link
buyucu
473 days ago
they have a larger model that is in previes and still training.
link