Hacker News new | ask | show | jobs
by dantodor 459 days ago
Try to use QWen. There has been a paper later that shows the influence of pre-training on the bump they get via RL.