Y
Hacker News
new
|
ask
|
show
|
jobs
by
dantodor
459 days ago
Try to use QWen. There has been a paper later that shows the influence of pre-training on the bump they get via RL.